Re: git pull takes ~8 seconds on up-to-date Linux git tree

2012-10-06 Thread Jeff King
On Fri, Oct 05, 2012 at 10:20:37PM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
   @@ -617,6 +618,8 @@ static struct commit_list 
   *paint_down_to_common(struct commit *one, int n, struc

one-object.flags |= PARENT1;
commit_list_insert_by_date(one, list);
   +if (!n)
   +return list;
for (i = 0; i  n; i++) {
twos[i]-object.flags |= PARENT2;
commit_list_insert_by_date(twos[i], list);
 
  This seems like an obvious optimization, but does it really have
  anything to do with the patch at hand?
 
 The function picks one and paints it against all others, but the
 logic assumes there must be at least one other to paint against;
 otherwise the traversal will not ever find a node that is painted
 with both PARENT1 and PARENT2 to stop, leading us to traverse all
 the way down to root.

Ah, OK. I was thinking it was just a way to skip the further logic,
which would come to the same answer (it does, just not quickly). Makes
sense.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git pull takes ~8 seconds on up-to-date Linux git tree

2012-10-05 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Here is a tested (in the sense that it passes the test suite, and
 also in the sense that an empty pull in the kernel history gives
 quick turnaround) patch.  As I do not think we would want to revert
 5802f81 (fmt-merge-msg: discard needless merge parents, 2012-04-18)
 which was a correctness fix, I think we would rather want to do
 something like this.

OK, I think I am convinced myself that this patch is the right fix.

The performance regression Markus saw is in fmt-merge-message, and
it is caused by the updated remove_redundant() that is used by
get_merge_bases_many() and reduce_heads().  On the topic branch, all
callers of reduce_heads() were passing commits that are already
parsed, but before the topic was merged to 'master', we added one
more caller to reduce_heads() on the 'master' front that passed an
unparsed commit, which is why the problem surfaced at that merge.

It might make sense to assert or die in commit_list_insert_by_date()
when a caller mistakenly pass an unparsed commit object to prevent
this kind of breakages in the future.

 -- 8 --
 Subject: paint_down_to_common(): parse commit before relying on its timestamp

 When refactoring the merge-base computation to reduce the pairwise
 O(n*(n-1)) traversals to parallel O(n) traversals, the code forgot
 that timestamp based heuristics needs each commit to have been
 parsed.  This caused an empty git pull to spend cycles, traversing
 the history all the way down to 0 (because an unparsed commit object
 has 0 timestamp, and any other commit object with positive timestamp
 will be processed for its parents, all getting parsed), only to come
 up with a merge message to be used.

 Signed-off-by: Junio C Hamano gits...@pobox.com
 ---
  commit.c | 5 +
  1 file changed, 5 insertions(+)

 diff --git i/commit.c w/commit.c
 index 0246767..213bc98 100644
 --- i/commit.c
 +++ w/commit.c
 @@ -609,6 +609,7 @@ static struct commit *interesting(struct commit_list 
 *list)
   return NULL;
  }
  
 +/* all input commits in one and twos[] must have been parsed! */
  static struct commit_list *paint_down_to_common(struct commit *one, int n, 
 struct commit **twos)
  {
   struct commit_list *list = NULL;
 @@ -617,6 +618,8 @@ static struct commit_list *paint_down_to_common(struct 
 commit *one, int n, struc
  
   one-object.flags |= PARENT1;
   commit_list_insert_by_date(one, list);
 + if (!n)
 + return list;
   for (i = 0; i  n; i++) {
   twos[i]-object.flags |= PARENT2;
   commit_list_insert_by_date(twos[i], list);
 @@ -737,6 +740,8 @@ static int remove_redundant(struct commit **array, int 
 cnt)
   redundant = xcalloc(cnt, 1);
   filled_index = xmalloc(sizeof(*filled_index) * (cnt - 1));
  
 + for (i = 0; i  cnt; i++)
 + parse_commit(array[i]);
   for (i = 0; i  cnt; i++) {
   struct commit_list *common;
  
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git pull takes ~8 seconds on up-to-date Linux git tree

2012-10-05 Thread Jeff King
On Fri, Oct 05, 2012 at 01:34:02PM -0700, Junio C Hamano wrote:

 OK, I think I am convinced myself that this patch is the right fix.
 
 The performance regression Markus saw is in fmt-merge-message, and
 it is caused by the updated remove_redundant() that is used by
 get_merge_bases_many() and reduce_heads().  On the topic branch, all
 callers of reduce_heads() were passing commits that are already
 parsed, but before the topic was merged to 'master', we added one
 more caller to reduce_heads() on the 'master' front that passed an
 unparsed commit, which is why the problem surfaced at that merge.

Thanks for tracking it down. That makes a lot of sense with the results
we are seeing.

 It might make sense to assert or die in commit_list_insert_by_date()
 when a caller mistakenly pass an unparsed commit object to prevent
 this kind of breakages in the future.

I wonder if it would be too much to just have commit_list_insert_by_date
call parse_commit. It is, after all, the exact moment when we need to
have the date valid (and by waiting until the last minute, we can
potentially avoid parses that would not otherwise need to happen). The
overhead in the common case should basically be the same as an assert:
checking that commit-object.parsed is true (we can always inline that
bit of parse_commit if we have to).

Of course, in this case it is not just commit_list_insert_by_date that
cares. paint_down_to_common also want commit-parents to be valid; I'm
surprised that dealing with unparsed commits did not also reveal an
error there.

In an object-oriented world, we would always get the attributes of a
commit through accessors that made sure the object was parsed. That
would be nicer, but it would also mean paying for the if (parsed)
conditional a lot more frequently.

  @@ -617,6 +618,8 @@ static struct commit_list *paint_down_to_common(struct 
  commit *one, int n, struc
   
  one-object.flags |= PARENT1;
  commit_list_insert_by_date(one, list);
  +   if (!n)
  +   return list;
  for (i = 0; i  n; i++) {
  twos[i]-object.flags |= PARENT2;
  commit_list_insert_by_date(twos[i], list);

This seems like an obvious optimization, but does it really have
anything to do with the patch at hand?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git pull takes ~8 seconds on up-to-date Linux git tree

2012-10-05 Thread Junio C Hamano
Jeff King p...@peff.net writes:

  @@ -617,6 +618,8 @@ static struct commit_list *paint_down_to_common(struct 
  commit *one, int n, struc
   
 one-object.flags |= PARENT1;
 commit_list_insert_by_date(one, list);
  +  if (!n)
  +  return list;
 for (i = 0; i  n; i++) {
 twos[i]-object.flags |= PARENT2;
 commit_list_insert_by_date(twos[i], list);

 This seems like an obvious optimization, but does it really have
 anything to do with the patch at hand?

The function picks one and paints it against all others, but the
logic assumes there must be at least one other to paint against;
otherwise the traversal will not ever find a node that is painted
with both PARENT1 and PARENT2 to stop, leading us to traverse all
the way down to root.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git pull takes ~8 seconds on up-to-date Linux git tree

2012-10-04 Thread Markus Trippelsdorf
Hi,

with current trunk I get the following on an up-to-date Linux tree:

markus@x4 linux % time git pull
Already up-to-date.
git pull  7.84s user 0.26s system 92% cpu 8.743 total

git version 1.7.12 is much quicker:

markus@x4 linux % time git pull
Already up-to-date.
git pull  0.10s user 0.02s system 16% cpu 0.740 total

perf shows for trunk:

22.11%  git-merge  libz.so.1.2.7  [.] 0x73bc   
22.03%git  libz.so.1.2.7  [.] 0x7338   
14.18%git  libz.so.1.2.7  [.] inflate  
13.70%  git-merge  libz.so.1.2.7  [.] inflate  
 9.18%git  git[.] 0x000ea391   
 8.56%  git-merge  git-merge  [.] 0x000f0598   
 1.58%  git-merge  libz.so.1.2.7  [.] adler32  
 1.52%git  libz.so.1.2.7  [.] adler32  
 0.59%git  [kernel.kallsyms]  [k] clear_page_c

and for 1.7.12:

39.29%git  git[.] 0x000b9fa8   
12.16%git  libz.so.1.2.7  [.] inflate  
 8.67%git  libz.so.1.2.7  [.] 0xa18e   
 8.49%  git-merge  git-merge  [.] 0x000efa15   
 4.96%git  libc-2.16.90.so[.] memcpy@@GLIBC_2.14   
 2.63%git  libc-2.16.90.so[.] _int_malloc  
 2.61%  git-merge  [kernel.kallsyms]  [k] clear_page_c 
 2.32%git  [kernel.kallsyms]  [k] clear_page_c 
 2.04%git  [kernel.kallsyms]  [k] filemap_fault
 1.87%  git-merge  libc-2.16.90.so[.] memcpy@@GLIBC_2.14  

-- 
Markus
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git pull takes ~8 seconds on up-to-date Linux git tree

2012-10-04 Thread Jeff King
On Thu, Oct 04, 2012 at 04:14:54PM +0200, Markus Trippelsdorf wrote:

 with current trunk I get the following on an up-to-date Linux tree:
 
 markus@x4 linux % time git pull
 Already up-to-date.
 git pull  7.84s user 0.26s system 92% cpu 8.743 total
 
 git version 1.7.12 is much quicker:
 
 markus@x4 linux % time git pull
 Already up-to-date.
 git pull  0.10s user 0.02s system 16% cpu 0.740 total

Yikes. I can easily reproduce here. Bisecting between master and
v1.7.12 gives a curious result: the slowdown first occurs with the merge
commit 34f5130 (Merge branch 'jc/merge-bases', 2012-09-11). But neither
of its parents is slow. I don't see anything obviously suspect in the
merge, though.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git pull takes ~8 seconds on up-to-date Linux git tree

2012-10-04 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Thu, Oct 04, 2012 at 04:14:54PM +0200, Markus Trippelsdorf wrote:

 with current trunk I get the following on an up-to-date Linux tree:
 
 markus@x4 linux % time git pull
 Already up-to-date.
 git pull  7.84s user 0.26s system 92% cpu 8.743 total
 
 git version 1.7.12 is much quicker:
 
 markus@x4 linux % time git pull
 Already up-to-date.
 git pull  0.10s user 0.02s system 16% cpu 0.740 total

 Yikes. I can easily reproduce here. Bisecting between master and
 v1.7.12 gives a curious result: the slowdown first occurs with the merge
 commit 34f5130 (Merge branch 'jc/merge-bases', 2012-09-11). But neither
 of its parents is slow. I don't see anything obviously suspect in the
 merge, though.

Thanks; we probably should revert the merge before the final and
handle the fallout later, unless we know why the integrated whole is
slower than its parts.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git pull takes ~8 seconds on up-to-date Linux git tree

2012-10-04 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Thu, Oct 04, 2012 at 04:14:54PM +0200, Markus Trippelsdorf wrote:

 with current trunk I get the following on an up-to-date Linux tree:
 
 markus@x4 linux % time git pull
 Already up-to-date.
 git pull  7.84s user 0.26s system 92% cpu 8.743 total
 
 git version 1.7.12 is much quicker:
 
 markus@x4 linux % time git pull
 Already up-to-date.
 git pull  0.10s user 0.02s system 16% cpu 0.740 total

 Yikes. I can easily reproduce here. Bisecting between master and
 v1.7.12 gives a curious result: the slowdown first occurs with the merge
 commit 34f5130 (Merge branch 'jc/merge-bases', 2012-09-11). But neither
 of its parents is slow. I don't see anything obviously suspect in the
 merge, though.

There is one extra call site of reduce_heads() added by that merge
relative to 34f5130.  It is likely that the updated reduce_heads()
is totally broken from performance point of view, or at least it is
not optimized for the usage pattern at the new call site, in which
case it would be very plausible that both parents of the merge
perform well while the merge result is sucky.

It gets more curious, though.

I am getting ~9 seconds with the tip of master and ~0.4 seconds with
1.7.12.

When 34f5130^2 is reverted at the tip of master, I get ~0.35
seconds.  That matches Markus's observation.

However.

If I revert 5802f81 that updated the implementation of fmt-merge-msg
on top of 'master', *without* reverting 34f5130^2, I get ~4.5 seconds.
As we are doing an Already up-to-date pull, I thought there is no
need to call fmt-merge-msg in the first place?

Which may indicate that git merge has been broken for a long time
and making unnecessary calls.

Hrmmm...

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git pull takes ~8 seconds on up-to-date Linux git tree

2012-10-04 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Thu, Oct 04, 2012 at 04:14:54PM +0200, Markus Trippelsdorf wrote:

 with current trunk I get the following on an up-to-date Linux tree:
 
 markus@x4 linux % time git pull
 Already up-to-date.
 git pull  7.84s user 0.26s system 92% cpu 8.743 total
 
 git version 1.7.12 is much quicker:
 
 markus@x4 linux % time git pull
 Already up-to-date.
 git pull  0.10s user 0.02s system 16% cpu 0.740 total

 Yikes. I can easily reproduce here. Bisecting between master and
 v1.7.12 gives a curious result: the slowdown first occurs with the merge
 commit 34f5130 (Merge branch 'jc/merge-bases', 2012-09-11). But neither
 of its parents is slow. I don't see anything obviously suspect in the
 merge, though.

I think the following is likely to be the correct solution to this.

 commit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git i/commit.c w/commit.c
index 0246767..e1dd5b9 100644
--- i/commit.c
+++ w/commit.c
@@ -733,6 +733,9 @@ static int remove_redundant(struct commit **array, int cnt)
int *filled_index;
int i, j, filled;
 
+   if (cnt  2)
+   return cnt;
+
work = xcalloc(cnt, sizeof(*work));
redundant = xcalloc(cnt, 1);
filled_index = xmalloc(sizeof(*filled_index) * (cnt - 1));
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git pull takes ~8 seconds on up-to-date Linux git tree

2012-10-04 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Jeff King p...@peff.net writes:

 On Thu, Oct 04, 2012 at 04:14:54PM +0200, Markus Trippelsdorf wrote:

 with current trunk I get the following on an up-to-date Linux tree:
 
 markus@x4 linux % time git pull
 Already up-to-date.
 git pull  7.84s user 0.26s system 92% cpu 8.743 total
 
 git version 1.7.12 is much quicker:
 
 markus@x4 linux % time git pull
 Already up-to-date.
 git pull  0.10s user 0.02s system 16% cpu 0.740 total

 Yikes. I can easily reproduce here. Bisecting between master and
 v1.7.12 gives a curious result: the slowdown first occurs with the merge
 commit 34f5130 (Merge branch 'jc/merge-bases', 2012-09-11). But neither
 of its parents is slow. I don't see anything obviously suspect in the
 merge, though.

 I think the following is likely to be the correct solution to this.

No, it is not.  Sorry for the noise.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git pull takes ~8 seconds on up-to-date Linux git tree

2012-10-04 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Junio C Hamano gits...@pobox.com writes:

 Jeff King p...@peff.net writes:

 On Thu, Oct 04, 2012 at 04:14:54PM +0200, Markus Trippelsdorf wrote:

 with current trunk I get the following on an up-to-date Linux tree:
 
 markus@x4 linux % time git pull
 Already up-to-date.
 git pull  7.84s user 0.26s system 92% cpu 8.743 total
 
 git version 1.7.12 is much quicker:
 
 markus@x4 linux % time git pull
 Already up-to-date.
 git pull  0.10s user 0.02s system 16% cpu 0.740 total

 Yikes. I can easily reproduce here. Bisecting between master and
 v1.7.12 gives a curious result: the slowdown first occurs with the merge
 commit 34f5130 (Merge branch 'jc/merge-bases', 2012-09-11). But neither
 of its parents is slow. I don't see anything obviously suspect in the
 merge, though.

 I think the following is likely to be the correct solution to this.

 No, it is not.  Sorry for the noise.

Here is a tested (in the sense that it passes the test suite, and
also in the sense that an empty pull in the kernel history gives
quick turnaround) patch.  As I do not think we would want to revert
5802f81 (fmt-merge-msg: discard needless merge parents, 2012-04-18)
which was a correctness fix, I think we would rather want to do
something like this.

-- 8 --
Subject: paint_down_to_common(): parse commit before relying on its timestamp

When refactoring the merge-base computation to reduce the pairwise
O(n*(n-1)) traversals to parallel O(n) traversals, the code forgot
that timestamp based heuristics needs each commit to have been
parsed.  This caused an empty git pull to spend cycles, traversing
the history all the way down to 0 (because an unparsed commit object
has 0 timestamp, and any other commit object with positive timestamp
will be processed for its parents, all getting parsed), only to come
up with a merge message to be used.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 commit.c | 5 +
 1 file changed, 5 insertions(+)

diff --git i/commit.c w/commit.c
index 0246767..213bc98 100644
--- i/commit.c
+++ w/commit.c
@@ -609,6 +609,7 @@ static struct commit *interesting(struct commit_list *list)
return NULL;
 }
 
+/* all input commits in one and twos[] must have been parsed! */
 static struct commit_list *paint_down_to_common(struct commit *one, int n, 
struct commit **twos)
 {
struct commit_list *list = NULL;
@@ -617,6 +618,8 @@ static struct commit_list *paint_down_to_common(struct 
commit *one, int n, struc
 
one-object.flags |= PARENT1;
commit_list_insert_by_date(one, list);
+   if (!n)
+   return list;
for (i = 0; i  n; i++) {
twos[i]-object.flags |= PARENT2;
commit_list_insert_by_date(twos[i], list);
@@ -737,6 +740,8 @@ static int remove_redundant(struct commit **array, int cnt)
redundant = xcalloc(cnt, 1);
filled_index = xmalloc(sizeof(*filled_index) * (cnt - 1));
 
+   for (i = 0; i  cnt; i++)
+   parse_commit(array[i]);
for (i = 0; i  cnt; i++) {
struct commit_list *common;
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html