Re: Huge performance bottleneck reading packs

2016-10-14 Thread Jeff King
On Fri, Oct 14, 2016 at 08:55:29AM +0200, Vegard Nossum wrote:

> On 10/13/2016 10:43 PM, Jeff King wrote:
> > No problem. I do think you'll benefit a lot from packing everything into
> > a single pack, but it's also clear that Git was doing more work than it
> > needed to be. This was a known issue when we added the racy-check to
> > has_sha1_file(), and knew that we might have to field reports like this
> > one.
> 
> After 'git gc' (with the .5G limit) I have 23 packs and with your patch
> I now get:
> 
> $ time git fetch
> 
> real0m26.520s
> user0m3.580s
> sys 0m0.636s

Cool. That's about what I'd expect based on the size numbers you gave
earlier. The other 23 seconds are likely being spent on passing the ref
advertisement over the network.

-Peff


Re: Huge performance bottleneck reading packs

2016-10-14 Thread Jakub Narębski
W dniu 13.10.2016 o 11:04, Vegard Nossum pisze:
> On 10/13/2016 01:47 AM, Jeff King wrote:
>> On Wed, Oct 12, 2016 at 07:18:07PM -0400, Jeff King wrote:
>>
>>> Also, is it possible to make the repository in question available? I
>>> might be able to reproduce based on your description, but it would save
>>> time if I could directly run gdb on your example.
> 
> I won't be able to make the repository available, sorry.

We have 'git fast-export --anonymize ...', but it would obviously
not preserve the pack structure (though it can help in other cases
when one cannot make repository available).

-- 
Jakub Narębski



Re: Huge performance bottleneck reading packs

2016-10-13 Thread Vegard Nossum

On 10/13/2016 10:43 PM, Jeff King wrote:

No problem. I do think you'll benefit a lot from packing everything into
a single pack, but it's also clear that Git was doing more work than it
needed to be. This was a known issue when we added the racy-check to
has_sha1_file(), and knew that we might have to field reports like this
one.


After 'git gc' (with the .5G limit) I have 23 packs and with your patch
I now get:

$ time git fetch

real0m26.520s
user0m3.580s
sys 0m0.636s

Thanks!


vegard


Re: Huge performance bottleneck reading packs

2016-10-13 Thread Jeff King
On Thu, Oct 13, 2016 at 08:18:11PM +0200, Vegard Nossum wrote:

> > My guess is that the number is relatively high. And that would explain
> > why nobody else has really complained much; such a pattern is probably
> > uncommon.
> 
> I get ~3,700 objects "they are advertising that we don't have".
> 
> They are all indeed tags which I don't have (nor want) locally.

Thanks. That's about what I expected.

> So without your patch I get these numbers:
> 
>   %   cumulative   self  self total
>  time   seconds   secondscalls   s/call   s/call  name
>  37.34 29.8329.83 1948265116 0.00 0.00  strip_suffix_mem
>  27.56 51.8522.0211358 0.00 0.01 prepare_packed_git_one
>  14.24 63.2311.38 1924457702 0.00 0.00  strip_suffix
>   4.88 67.13 3.90 26045319 0.00 0.00  find_pack_entry_one
>   4.17 70.46 3.33 84828149 0.00 0.00  hashcmp
>   3.28 73.08 2.62 79760199 0.00 0.00  hashcmp
>   2.44 75.03 1.95 const_error
> 
> the call graph data shows all the reprepare_packed_git() calls to come
> from a chain like this:
> 
> do_for_each_ref
> do_for_each_ref_iterator
> ref_iterator_advance
> files_ref_iterator_advance
> ref_resolves_to_object
> has_sha1_file
> has_sha1_file_with_flags
> reprepare_packed_git

Hrm. It seems weird that we'd hit reprepare_packed_git() via
do_for_each_ref(), because that's looking at _your_ refs. So any time
that code path hits reprepare_packed_git(), it should be complaining
about repository corruption to stderr.

And that also wouldn't make sense that my patch would improve it. Are
you sure you've read the perf output correctly (I'll admit that I am
often confused by the way it reports call graphs)?

> With your (first) patch I get this instead:
> 
>   %   cumulative   self  self total
>  time   seconds   secondscalls  ms/call  ms/call  name
>  29.41  0.25 0.25  4490399 0.00 0.00  hashcmp
>  16.47  0.39 0.14   843447 0.00 0.00  find_pack_entry_one

These functions appearing at the top are probably a sign that you have
too many packs (these are just object lookup, which has to linearly try
each pack).

So I'd expect things to improve after a git-gc (and especially if you
turn off the packSizeLimit).

> Do you have all the profile data you were interested in before I try a
> 'git gc'?

Yes, I think so. At least I'm satisfied that there's not another
HAS_SHA1_QUICK case that I'm missing.

> I really appreciate the quick response and the work you put into fixing
> this, even when it's my fault for disabling gc, thanks!

No problem. I do think you'll benefit a lot from packing everything into
a single pack, but it's also clear that Git was doing more work than it
needed to be. This was a known issue when we added the racy-check to
has_sha1_file(), and knew that we might have to field reports like this
one.

-Peff


Re: Huge performance bottleneck reading packs

2016-10-13 Thread Vegard Nossum

On 10/13/2016 05:26 PM, Jeff King wrote:

On Thu, Oct 13, 2016 at 09:20:07AM +0200, Vegard Nossum wrote:


Does the patch below help?


Yes, ~2m10s -> ~1m25s when I test a git fetch this morning (the other
variation in time may be due to different CPU usage by other programs,
but I ran with/without the patch multiple times and the difference is
consistent).
[...]
There are some 20k refs on the remote, closer to 25k locally.


OK, that makes some sense. For whatever reason, your remote has a bunch
of tags that point to objects you do not already have. That could
happen, I think, if the remote added a lot of tags since you cloned
(because clone will grab all of the tags), but those tags do not point
to history that you are otherwise fetching (since fetch by default will
"auto-follow" such tags).

It might be interesting to see the results of:

  # all the objects we have
  git cat-file --batch-all-objects --batch-check='%(objectname)' >us

  # all the objects the remote is advertising
  git ls-remote origin | cut -f1 | sort -u >them

  # what they are advertising that we don't have
  comm -13 us them | wc -l

My guess is that the number is relatively high. And that would explain
why nobody else has really complained much; such a pattern is probably
uncommon.


I get ~3,700 objects "they are advertising that we don't have".

They are all indeed tags which I don't have (nor want) locally.


The fetch doesn't actually get anything from the remote as everything is
already up to date (that makes the 2m40s times even more frustrating in
a way :-)). Here's count-objects:


If the fetch is largely a noop, then that makes me wonder why we are
spending even a minute in the "good" case. I wonder if there is some


It was user time I reported and this is over network, so some of it
might be IO wait on the network? With the (first) patch:

$ time git fetch

real1m9.248s
user0m40.808s
sys 0m2.996s

Also see the profile data below.


other spot that is wasting CPU on some inefficient data structure
related to the number of refs you have. If you can do any profiling that
points to a hot spot, that would be interesting to see (and also whether
a gc improves things).

I see in find_non_local_tags() that we build up a sorted string_list via
repeated calls to string_list_insert(), which will keep the thing sorted
at each stage. That's not as efficient as just sorting at the end, but I
think it's probably OK in practice because we actually feed it via
for_each_ref(), whose output is sorted, and so we'd always just be
appending to the end.


So without your patch I get these numbers:

  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 37.34 29.8329.83 1948265116 0.00 0.00  strip_suffix_mem
 27.56 51.8522.0211358 0.00 0.01 
prepare_packed_git_one

 14.24 63.2311.38 1924457702 0.00 0.00  strip_suffix
  4.88 67.13 3.90 26045319 0.00 0.00  find_pack_entry_one
  4.17 70.46 3.33 84828149 0.00 0.00  hashcmp
  3.28 73.08 2.62 79760199 0.00 0.00  hashcmp
  2.44 75.03 1.95 const_error

the call graph data shows all the reprepare_packed_git() calls to come
from a chain like this:

do_for_each_ref
do_for_each_ref_iterator
ref_iterator_advance
files_ref_iterator_advance
ref_resolves_to_object
has_sha1_file
has_sha1_file_with_flags
reprepare_packed_git

the do_for_each_ref() calls come from prepare_replace_object(),
do_fetch(), find_common(), everything_local(),
check_for_new_submodule_commits(), and find_non_local_tags().

With your (first) patch I get this instead:

  %   cumulative   self  self total
 time   seconds   secondscalls  ms/call  ms/call  name
 29.41  0.25 0.25  4490399 0.00 0.00  hashcmp
 16.47  0.39 0.14   843447 0.00 0.00  find_pack_entry_one
 10.59  0.48 0.0960490 0.00 0.00 
unpack_object_header_buffer

  4.71  0.52 0.04   167441 0.00 0.00  lookup_object
  3.53  0.55 0.03   843447 0.00 0.00  fill_pack_entry
  3.53  0.58 0.03   213169 0.00 0.00 
check_refname_component

  2.35  0.60 0.02   550723 0.00 0.00  strip_suffix_mem
  2.35  0.62 0.02   214607 0.00 0.00  insert_obj_hash
  2.35  0.64 0.02   158866 0.00 0.00  hashcmp
  2.35  0.66 0.02   153936 0.00 0.00 
nth_packed_object_offset


Do you have all the profile data you were interested in before I try a
'git gc'?

I really appreciate the quick response and the work you put into fixing
this, even when it's my fault for disabling gc, thanks!


Vegard


Re: Huge performance bottleneck reading packs

2016-10-13 Thread Jeff King
On Thu, Oct 13, 2016 at 09:20:07AM +0200, Vegard Nossum wrote:

> > Does the patch below help?
> 
> Yes, ~2m10s -> ~1m25s when I test a git fetch this morning (the other
> variation in time may be due to different CPU usage by other programs,
> but I ran with/without the patch multiple times and the difference is
> consistent).
> [...]
> There are some 20k refs on the remote, closer to 25k locally.

OK, that makes some sense. For whatever reason, your remote has a bunch
of tags that point to objects you do not already have. That could
happen, I think, if the remote added a lot of tags since you cloned
(because clone will grab all of the tags), but those tags do not point
to history that you are otherwise fetching (since fetch by default will
"auto-follow" such tags).

It might be interesting to see the results of:

  # all the objects we have
  git cat-file --batch-all-objects --batch-check='%(objectname)' >us

  # all the objects the remote is advertising
  git ls-remote origin | cut -f1 | sort -u >them

  # what they are advertising that we don't have
  comm -13 us them | wc -l

My guess is that the number is relatively high. And that would explain
why nobody else has really complained much; such a pattern is probably
uncommon.

> The fetch doesn't actually get anything from the remote as everything is
> already up to date (that makes the 2m40s times even more frustrating in
> a way :-)). Here's count-objects:

If the fetch is largely a noop, then that makes me wonder why we are
spending even a minute in the "good" case. I wonder if there is some
other spot that is wasting CPU on some inefficient data structure
related to the number of refs you have. If you can do any profiling that
points to a hot spot, that would be interesting to see (and also whether
a gc improves things).

I see in find_non_local_tags() that we build up a sorted string_list via
repeated calls to string_list_insert(), which will keep the thing sorted
at each stage. That's not as efficient as just sorting at the end, but I
think it's probably OK in practice because we actually feed it via
for_each_ref(), whose output is sorted, and so we'd always just be
appending to the end.

-Peff


Re: Huge performance bottleneck reading packs

2016-10-13 Thread Jeff King
On Thu, Oct 13, 2016 at 09:17:34AM +0200, Vegard Nossum wrote:

> Oops. I disabled gc a while ago; one reason I did that is that it takes
> a long time to run and it has a tendency to kick in at the worst time. I
> guess I should really put it in cron then.
> 
> I'm not sure if this is related, but I also had a problem with GitPython
> and large pack files in the past (" ValueError: Couldn't obtain fanout
> table or warning: packfile ./objects/pack/pack cannot be accessed")

Sounds like they didn't correctly implement the extra index fanout that
happens for pack above 2G. The old Grit library had a similar bug.

> and I have pack.packSizeLimit set to 512m to fix that.
> Although the whole repo is 17G so I guess it shouldn't be necessary to
> have that many pack files.

Using packSizeLimit does "solve" that problem, but it comes with its own
set of issues. There is a very good chance that your repository would be
much smaller than 17G as a single packfile, because Git does not allow
deltas across packs, and it does not optimize the placement of objects
to keep delta-related objects in a single pack. So you'll quite often be
storing full copies of objects that could otherwise be stored as a tiny
delta.

You might want to compare the resulting size for a full repack with and
without pack.packSizeLimit.

But I agree that is not the cause of your thousand packs. They are more
likely the accumulated cruft of a thousand fetches.

-Peff


Re: Huge performance bottleneck reading packs

2016-10-13 Thread Vegard Nossum

On 10/13/2016 12:45 AM, Junio C Hamano wrote:
> Vegard Nossum  writes:
>
>> A closer inspection reveals the problem to really be that this is an
>> extremely hot path with more than -- holy cow -- 4,106,756,451
>> iterations on the 'packed_git' list for a single 'git fetch' on my
>> repository. I'm guessing the patch above just made the inner loop
>> ever so slightly slower.
>
> Very plausible, and this ...
>
>> My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files,
>> and 4 tmp_pack_* files).
>
> ... may explain why nobody else has seen a difference.
>
> Is there a reason why your repository has that many pack files?  Is
> automatic GC not working for some reason?

Oops. I disabled gc a while ago; one reason I did that is that it takes
a long time to run and it has a tendency to kick in at the worst time. I
guess I should really put it in cron then.

I'm not sure if this is related, but I also had a problem with GitPython
and large pack files in the past (" ValueError: Couldn't obtain fanout
table or warning: packfile ./objects/pack/pack cannot be accessed")
and I have pack.packSizeLimit set to 512m to fix that.
Although the whole repo is 17G so I guess it shouldn't be necessary to
have that many pack files.

Will try Jeff's patch, then a gc. Thanks!


Vegard


Re: Huge performance bottleneck reading packs

2016-10-13 Thread Vegard Nossum

On 10/13/2016 01:47 AM, Jeff King wrote:

On Wed, Oct 12, 2016 at 07:18:07PM -0400, Jeff King wrote:


Also, is it possible to make the repository in question available? I
might be able to reproduce based on your description, but it would save
time if I could directly run gdb on your example.


I won't be able to make the repository available, sorry.


I tried this by making a bunch of packs in linux.git (my standard "this
is pretty big" repo), like so:

  for i in $(seq 1000); do
git rev-list --objects HEAD~$((i+1))..HEAD~$i |
git pack-objects --delta-base-offset .git/objects/pack/pack
  done

and then doing a 25,000-object fetch from upstream (no significance to
the number; that's just how far behind upstream I happened to be).

However, I didn't notice any regression. In fact, it was much _slower_
than v1.9.0, because that older version didn't have threaded index-pack.

If you can't share the repo directly, can you tell us more about your
fetch? How many objects are in your repository? How many objects are
fetched? How many refs are there on the remote side?


The fetch doesn't actually get anything from the remote as everything is
already up to date (that makes the 2m40s times even more frustrating in
a way :-)). Here's count-objects:

$ git count-objects -v
warning: garbage found: .git/objects/pack/tmp_pack_pAZcu4
warning: garbage found: .git/objects/pack/tmp_pack_KhzrrI
warning: garbage found: .git/objects/pack/tmp_pack_mycfro
warning: garbage found: .git/objects/pack/tmp_pack_2kxKOn
count: 51609
size: 288768
in-pack: 23902336
packs: 1044
size-pack: 16588157
prune-packable: 48628
garbage: 4
size-garbage: 84792

There are some 20k refs on the remote, closer to 25k locally.

I'll try to get a profile (without your patch) before doing a gc run.


Vegard


Re: Huge performance bottleneck reading packs

2016-10-13 Thread Vegard Nossum

On 10/13/2016 01:01 AM, Jeff King wrote:

On Thu, Oct 13, 2016 at 12:30:52AM +0200, Vegard Nossum wrote:


However, the commit found by 'git blame' above appears just fine to me,
I haven't been able to spot a bug in it.

A closer inspection reveals the problem to really be that this is an
extremely hot path with more than -- holy cow -- 4,106,756,451
iterations on the 'packed_git' list for a single 'git fetch' on my
repository. I'm guessing the patch above just made the inner loop
ever so slightly slower.

My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files,
and 4 tmp_pack_* files).


Yeah. I agree that the commit you found makes the check a little more
expensive, but I think the root of the problem is calling
prepare_packed_git_one many times. This _should_ happen once for each
pack at program startup, and possibly again if we need to re-scan the
pack directory to account for racing with a simultaneous repack.

The latter is generally triggered when we fail to look up an object we
expect to exist. So I'd suspect 45e8a74 (has_sha1_file: re-check pack
directory before giving up, 2013-08-30) is playing a part. We dealt with
that to some degree in 0eeb077 (index-pack: avoid excessive re-reading
of pack directory, 2015-06-09), but it would not surprise me if there is
another spot that needs similar treatment.

Does the patch below help?


Yes, ~2m10s -> ~1m25s when I test a git fetch this morning (the other
variation in time may be due to different CPU usage by other programs,
but I ran with/without the patch multiple times and the difference is
consistent).

Thanks,


Vegard


Re: Huge performance bottleneck reading packs

2016-10-12 Thread Jeff King
On Wed, Oct 12, 2016 at 07:18:07PM -0400, Jeff King wrote:

> Also, is it possible to make the repository in question available? I
> might be able to reproduce based on your description, but it would save
> time if I could directly run gdb on your example.

I tried this by making a bunch of packs in linux.git (my standard "this
is pretty big" repo), like so:

  for i in $(seq 1000); do
git rev-list --objects HEAD~$((i+1))..HEAD~$i |
git pack-objects --delta-base-offset .git/objects/pack/pack
  done

and then doing a 25,000-object fetch from upstream (no significance to
the number; that's just how far behind upstream I happened to be).

However, I didn't notice any regression. In fact, it was much _slower_
than v1.9.0, because that older version didn't have threaded index-pack.

If you can't share the repo directly, can you tell us more about your
fetch? How many objects are in your repository? How many objects are
fetched? How many refs are there on the remote side?

-Peff


Re: Huge performance bottleneck reading packs

2016-10-12 Thread Jeff King
On Wed, Oct 12, 2016 at 07:01:43PM -0400, Jeff King wrote:

> On Thu, Oct 13, 2016 at 12:30:52AM +0200, Vegard Nossum wrote:
> 
> > However, the commit found by 'git blame' above appears just fine to me,
> > I haven't been able to spot a bug in it.
> > 
> > A closer inspection reveals the problem to really be that this is an
> > extremely hot path with more than -- holy cow -- 4,106,756,451
> > iterations on the 'packed_git' list for a single 'git fetch' on my
> > repository. I'm guessing the patch above just made the inner loop
> > ever so slightly slower.
> > 
> > My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files,
> > and 4 tmp_pack_* files).
> 
> Yeah. I agree that the commit you found makes the check a little more
> expensive, but I think the root of the problem is calling
> prepare_packed_git_one many times. This _should_ happen once for each
> pack at program startup, and possibly again if we need to re-scan the
> pack directory to account for racing with a simultaneous repack.
> 
> The latter is generally triggered when we fail to look up an object we
> expect to exist. So I'd suspect 45e8a74 (has_sha1_file: re-check pack
> directory before giving up, 2013-08-30) is playing a part. We dealt with
> that to some degree in 0eeb077 (index-pack: avoid excessive re-reading
> of pack directory, 2015-06-09), but it would not surprise me if there is
> another spot that needs similar treatment.
> 
> Does the patch below help?

Also, is it possible to make the repository in question available? I
might be able to reproduce based on your description, but it would save
time if I could directly run gdb on your example.

Specifically, I'm interested in the call graph that leads to calling
reprepare_packed_git().

-Peff


Re: Huge performance bottleneck reading packs

2016-10-12 Thread Jeff King
On Thu, Oct 13, 2016 at 12:30:52AM +0200, Vegard Nossum wrote:

> However, the commit found by 'git blame' above appears just fine to me,
> I haven't been able to spot a bug in it.
> 
> A closer inspection reveals the problem to really be that this is an
> extremely hot path with more than -- holy cow -- 4,106,756,451
> iterations on the 'packed_git' list for a single 'git fetch' on my
> repository. I'm guessing the patch above just made the inner loop
> ever so slightly slower.
> 
> My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files,
> and 4 tmp_pack_* files).

Yeah. I agree that the commit you found makes the check a little more
expensive, but I think the root of the problem is calling
prepare_packed_git_one many times. This _should_ happen once for each
pack at program startup, and possibly again if we need to re-scan the
pack directory to account for racing with a simultaneous repack.

The latter is generally triggered when we fail to look up an object we
expect to exist. So I'd suspect 45e8a74 (has_sha1_file: re-check pack
directory before giving up, 2013-08-30) is playing a part. We dealt with
that to some degree in 0eeb077 (index-pack: avoid excessive re-reading
of pack directory, 2015-06-09), but it would not surprise me if there is
another spot that needs similar treatment.

Does the patch below help?

diff --git a/builtin/fetch.c b/builtin/fetch.c
index d5329f9..c0f3c2c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -243,7 +243,7 @@ static void find_non_local_tags(struct transport *transport,
if (ends_with(ref->name, "^{}")) {
if (item && !has_object_file(&ref->old_oid) &&
!will_fetch(head, ref->old_oid.hash) &&
-   !has_sha1_file(item->util) &&
+   !has_sha1_file_with_flags(item->util, 
HAS_SHA1_QUICK) &&
!will_fetch(head, item->util))
item->util = NULL;
item = NULL;
@@ -256,7 +256,8 @@ static void find_non_local_tags(struct transport *transport,
 * to check if it is a lightweight tag that we want to
 * fetch.
 */
-   if (item && !has_sha1_file(item->util) &&
+   if (item &&
+   !has_sha1_file_with_flags(item->util, HAS_SHA1_QUICK) &&
!will_fetch(head, item->util))
item->util = NULL;
 
@@ -276,7 +277,8 @@ static void find_non_local_tags(struct transport *transport,
 * We may have a final lightweight tag that needs to be
 * checked to see if it needs fetching.
 */
-   if (item && !has_sha1_file(item->util) &&
+   if (item &&
+   !has_sha1_file_with_flags(item->util, HAS_SHA1_QUICK) &&
!will_fetch(head, item->util))
item->util = NULL;
 

-Peff


Re: Huge performance bottleneck reading packs

2016-10-12 Thread Junio C Hamano
Vegard Nossum  writes:

> A closer inspection reveals the problem to really be that this is an
> extremely hot path with more than -- holy cow -- 4,106,756,451
> iterations on the 'packed_git' list for a single 'git fetch' on my
> repository. I'm guessing the patch above just made the inner loop
> ever so slightly slower.

Very plausible, and this ...

> My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files,
> and 4 tmp_pack_* files).

... may explain why nobody else has seen a difference.  

Is there a reason why your repository has that many pack files?  Is
automatic GC not working for some reason?

"gc" would try to make sure that you have reasonably low number of
packs, as having too many packs is detrimental for performance for
multiple reasons, including:

 * All objects in a single pack expressed in delta format (i.e. only
   the difference from another object is stored) must eventually
   have another object that its difference is based on recorded in
   the full format in the same packfile.

 * A single packfile records a single object only once, but it is
   normal (and often required because of the point above) that the
   same object appears in multiple packfiles.

 * Locating of objects from a single packfile uses its .idx file by
   binary search of sorted list of object names, which is efficient,
   but this cost is multiplied linearly as the number of packs you
   have in your repository.