Re: Huge performance bottleneck reading packs
On Fri, Oct 14, 2016 at 08:55:29AM +0200, Vegard Nossum wrote: > On 10/13/2016 10:43 PM, Jeff King wrote: > > No problem. I do think you'll benefit a lot from packing everything into > > a single pack, but it's also clear that Git was doing more work than it > > needed to be. This was a known issue when we added the racy-check to > > has_sha1_file(), and knew that we might have to field reports like this > > one. > > After 'git gc' (with the .5G limit) I have 23 packs and with your patch > I now get: > > $ time git fetch > > real0m26.520s > user0m3.580s > sys 0m0.636s Cool. That's about what I'd expect based on the size numbers you gave earlier. The other 23 seconds are likely being spent on passing the ref advertisement over the network. -Peff
Re: Huge performance bottleneck reading packs
W dniu 13.10.2016 o 11:04, Vegard Nossum pisze: > On 10/13/2016 01:47 AM, Jeff King wrote: >> On Wed, Oct 12, 2016 at 07:18:07PM -0400, Jeff King wrote: >> >>> Also, is it possible to make the repository in question available? I >>> might be able to reproduce based on your description, but it would save >>> time if I could directly run gdb on your example. > > I won't be able to make the repository available, sorry. We have 'git fast-export --anonymize ...', but it would obviously not preserve the pack structure (though it can help in other cases when one cannot make repository available). -- Jakub Narębski
Re: Huge performance bottleneck reading packs
On 10/13/2016 10:43 PM, Jeff King wrote: No problem. I do think you'll benefit a lot from packing everything into a single pack, but it's also clear that Git was doing more work than it needed to be. This was a known issue when we added the racy-check to has_sha1_file(), and knew that we might have to field reports like this one. After 'git gc' (with the .5G limit) I have 23 packs and with your patch I now get: $ time git fetch real0m26.520s user0m3.580s sys 0m0.636s Thanks! vegard
Re: Huge performance bottleneck reading packs
On Thu, Oct 13, 2016 at 08:18:11PM +0200, Vegard Nossum wrote: > > My guess is that the number is relatively high. And that would explain > > why nobody else has really complained much; such a pattern is probably > > uncommon. > > I get ~3,700 objects "they are advertising that we don't have". > > They are all indeed tags which I don't have (nor want) locally. Thanks. That's about what I expected. > So without your patch I get these numbers: > > % cumulative self self total > time seconds secondscalls s/call s/call name > 37.34 29.8329.83 1948265116 0.00 0.00 strip_suffix_mem > 27.56 51.8522.0211358 0.00 0.01 prepare_packed_git_one > 14.24 63.2311.38 1924457702 0.00 0.00 strip_suffix > 4.88 67.13 3.90 26045319 0.00 0.00 find_pack_entry_one > 4.17 70.46 3.33 84828149 0.00 0.00 hashcmp > 3.28 73.08 2.62 79760199 0.00 0.00 hashcmp > 2.44 75.03 1.95 const_error > > the call graph data shows all the reprepare_packed_git() calls to come > from a chain like this: > > do_for_each_ref > do_for_each_ref_iterator > ref_iterator_advance > files_ref_iterator_advance > ref_resolves_to_object > has_sha1_file > has_sha1_file_with_flags > reprepare_packed_git Hrm. It seems weird that we'd hit reprepare_packed_git() via do_for_each_ref(), because that's looking at _your_ refs. So any time that code path hits reprepare_packed_git(), it should be complaining about repository corruption to stderr. And that also wouldn't make sense that my patch would improve it. Are you sure you've read the perf output correctly (I'll admit that I am often confused by the way it reports call graphs)? > With your (first) patch I get this instead: > > % cumulative self self total > time seconds secondscalls ms/call ms/call name > 29.41 0.25 0.25 4490399 0.00 0.00 hashcmp > 16.47 0.39 0.14 843447 0.00 0.00 find_pack_entry_one These functions appearing at the top are probably a sign that you have too many packs (these are just object lookup, which has to linearly try each pack). So I'd expect things to improve after a git-gc (and especially if you turn off the packSizeLimit). > Do you have all the profile data you were interested in before I try a > 'git gc'? Yes, I think so. At least I'm satisfied that there's not another HAS_SHA1_QUICK case that I'm missing. > I really appreciate the quick response and the work you put into fixing > this, even when it's my fault for disabling gc, thanks! No problem. I do think you'll benefit a lot from packing everything into a single pack, but it's also clear that Git was doing more work than it needed to be. This was a known issue when we added the racy-check to has_sha1_file(), and knew that we might have to field reports like this one. -Peff
Re: Huge performance bottleneck reading packs
On 10/13/2016 05:26 PM, Jeff King wrote: On Thu, Oct 13, 2016 at 09:20:07AM +0200, Vegard Nossum wrote: Does the patch below help? Yes, ~2m10s -> ~1m25s when I test a git fetch this morning (the other variation in time may be due to different CPU usage by other programs, but I ran with/without the patch multiple times and the difference is consistent). [...] There are some 20k refs on the remote, closer to 25k locally. OK, that makes some sense. For whatever reason, your remote has a bunch of tags that point to objects you do not already have. That could happen, I think, if the remote added a lot of tags since you cloned (because clone will grab all of the tags), but those tags do not point to history that you are otherwise fetching (since fetch by default will "auto-follow" such tags). It might be interesting to see the results of: # all the objects we have git cat-file --batch-all-objects --batch-check='%(objectname)' >us # all the objects the remote is advertising git ls-remote origin | cut -f1 | sort -u >them # what they are advertising that we don't have comm -13 us them | wc -l My guess is that the number is relatively high. And that would explain why nobody else has really complained much; such a pattern is probably uncommon. I get ~3,700 objects "they are advertising that we don't have". They are all indeed tags which I don't have (nor want) locally. The fetch doesn't actually get anything from the remote as everything is already up to date (that makes the 2m40s times even more frustrating in a way :-)). Here's count-objects: If the fetch is largely a noop, then that makes me wonder why we are spending even a minute in the "good" case. I wonder if there is some It was user time I reported and this is over network, so some of it might be IO wait on the network? With the (first) patch: $ time git fetch real1m9.248s user0m40.808s sys 0m2.996s Also see the profile data below. other spot that is wasting CPU on some inefficient data structure related to the number of refs you have. If you can do any profiling that points to a hot spot, that would be interesting to see (and also whether a gc improves things). I see in find_non_local_tags() that we build up a sorted string_list via repeated calls to string_list_insert(), which will keep the thing sorted at each stage. That's not as efficient as just sorting at the end, but I think it's probably OK in practice because we actually feed it via for_each_ref(), whose output is sorted, and so we'd always just be appending to the end. So without your patch I get these numbers: % cumulative self self total time seconds secondscalls s/call s/call name 37.34 29.8329.83 1948265116 0.00 0.00 strip_suffix_mem 27.56 51.8522.0211358 0.00 0.01 prepare_packed_git_one 14.24 63.2311.38 1924457702 0.00 0.00 strip_suffix 4.88 67.13 3.90 26045319 0.00 0.00 find_pack_entry_one 4.17 70.46 3.33 84828149 0.00 0.00 hashcmp 3.28 73.08 2.62 79760199 0.00 0.00 hashcmp 2.44 75.03 1.95 const_error the call graph data shows all the reprepare_packed_git() calls to come from a chain like this: do_for_each_ref do_for_each_ref_iterator ref_iterator_advance files_ref_iterator_advance ref_resolves_to_object has_sha1_file has_sha1_file_with_flags reprepare_packed_git the do_for_each_ref() calls come from prepare_replace_object(), do_fetch(), find_common(), everything_local(), check_for_new_submodule_commits(), and find_non_local_tags(). With your (first) patch I get this instead: % cumulative self self total time seconds secondscalls ms/call ms/call name 29.41 0.25 0.25 4490399 0.00 0.00 hashcmp 16.47 0.39 0.14 843447 0.00 0.00 find_pack_entry_one 10.59 0.48 0.0960490 0.00 0.00 unpack_object_header_buffer 4.71 0.52 0.04 167441 0.00 0.00 lookup_object 3.53 0.55 0.03 843447 0.00 0.00 fill_pack_entry 3.53 0.58 0.03 213169 0.00 0.00 check_refname_component 2.35 0.60 0.02 550723 0.00 0.00 strip_suffix_mem 2.35 0.62 0.02 214607 0.00 0.00 insert_obj_hash 2.35 0.64 0.02 158866 0.00 0.00 hashcmp 2.35 0.66 0.02 153936 0.00 0.00 nth_packed_object_offset Do you have all the profile data you were interested in before I try a 'git gc'? I really appreciate the quick response and the work you put into fixing this, even when it's my fault for disabling gc, thanks! Vegard
Re: Huge performance bottleneck reading packs
On Thu, Oct 13, 2016 at 09:20:07AM +0200, Vegard Nossum wrote: > > Does the patch below help? > > Yes, ~2m10s -> ~1m25s when I test a git fetch this morning (the other > variation in time may be due to different CPU usage by other programs, > but I ran with/without the patch multiple times and the difference is > consistent). > [...] > There are some 20k refs on the remote, closer to 25k locally. OK, that makes some sense. For whatever reason, your remote has a bunch of tags that point to objects you do not already have. That could happen, I think, if the remote added a lot of tags since you cloned (because clone will grab all of the tags), but those tags do not point to history that you are otherwise fetching (since fetch by default will "auto-follow" such tags). It might be interesting to see the results of: # all the objects we have git cat-file --batch-all-objects --batch-check='%(objectname)' >us # all the objects the remote is advertising git ls-remote origin | cut -f1 | sort -u >them # what they are advertising that we don't have comm -13 us them | wc -l My guess is that the number is relatively high. And that would explain why nobody else has really complained much; such a pattern is probably uncommon. > The fetch doesn't actually get anything from the remote as everything is > already up to date (that makes the 2m40s times even more frustrating in > a way :-)). Here's count-objects: If the fetch is largely a noop, then that makes me wonder why we are spending even a minute in the "good" case. I wonder if there is some other spot that is wasting CPU on some inefficient data structure related to the number of refs you have. If you can do any profiling that points to a hot spot, that would be interesting to see (and also whether a gc improves things). I see in find_non_local_tags() that we build up a sorted string_list via repeated calls to string_list_insert(), which will keep the thing sorted at each stage. That's not as efficient as just sorting at the end, but I think it's probably OK in practice because we actually feed it via for_each_ref(), whose output is sorted, and so we'd always just be appending to the end. -Peff
Re: Huge performance bottleneck reading packs
On Thu, Oct 13, 2016 at 09:17:34AM +0200, Vegard Nossum wrote: > Oops. I disabled gc a while ago; one reason I did that is that it takes > a long time to run and it has a tendency to kick in at the worst time. I > guess I should really put it in cron then. > > I'm not sure if this is related, but I also had a problem with GitPython > and large pack files in the past (" ValueError: Couldn't obtain fanout > table or warning: packfile ./objects/pack/pack cannot be accessed") Sounds like they didn't correctly implement the extra index fanout that happens for pack above 2G. The old Grit library had a similar bug. > and I have pack.packSizeLimit set to 512m to fix that. > Although the whole repo is 17G so I guess it shouldn't be necessary to > have that many pack files. Using packSizeLimit does "solve" that problem, but it comes with its own set of issues. There is a very good chance that your repository would be much smaller than 17G as a single packfile, because Git does not allow deltas across packs, and it does not optimize the placement of objects to keep delta-related objects in a single pack. So you'll quite often be storing full copies of objects that could otherwise be stored as a tiny delta. You might want to compare the resulting size for a full repack with and without pack.packSizeLimit. But I agree that is not the cause of your thousand packs. They are more likely the accumulated cruft of a thousand fetches. -Peff
Re: Huge performance bottleneck reading packs
On 10/13/2016 12:45 AM, Junio C Hamano wrote: > Vegard Nossum writes: > >> A closer inspection reveals the problem to really be that this is an >> extremely hot path with more than -- holy cow -- 4,106,756,451 >> iterations on the 'packed_git' list for a single 'git fetch' on my >> repository. I'm guessing the patch above just made the inner loop >> ever so slightly slower. > > Very plausible, and this ... > >> My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files, >> and 4 tmp_pack_* files). > > ... may explain why nobody else has seen a difference. > > Is there a reason why your repository has that many pack files? Is > automatic GC not working for some reason? Oops. I disabled gc a while ago; one reason I did that is that it takes a long time to run and it has a tendency to kick in at the worst time. I guess I should really put it in cron then. I'm not sure if this is related, but I also had a problem with GitPython and large pack files in the past (" ValueError: Couldn't obtain fanout table or warning: packfile ./objects/pack/pack cannot be accessed") and I have pack.packSizeLimit set to 512m to fix that. Although the whole repo is 17G so I guess it shouldn't be necessary to have that many pack files. Will try Jeff's patch, then a gc. Thanks! Vegard
Re: Huge performance bottleneck reading packs
On 10/13/2016 01:47 AM, Jeff King wrote: On Wed, Oct 12, 2016 at 07:18:07PM -0400, Jeff King wrote: Also, is it possible to make the repository in question available? I might be able to reproduce based on your description, but it would save time if I could directly run gdb on your example. I won't be able to make the repository available, sorry. I tried this by making a bunch of packs in linux.git (my standard "this is pretty big" repo), like so: for i in $(seq 1000); do git rev-list --objects HEAD~$((i+1))..HEAD~$i | git pack-objects --delta-base-offset .git/objects/pack/pack done and then doing a 25,000-object fetch from upstream (no significance to the number; that's just how far behind upstream I happened to be). However, I didn't notice any regression. In fact, it was much _slower_ than v1.9.0, because that older version didn't have threaded index-pack. If you can't share the repo directly, can you tell us more about your fetch? How many objects are in your repository? How many objects are fetched? How many refs are there on the remote side? The fetch doesn't actually get anything from the remote as everything is already up to date (that makes the 2m40s times even more frustrating in a way :-)). Here's count-objects: $ git count-objects -v warning: garbage found: .git/objects/pack/tmp_pack_pAZcu4 warning: garbage found: .git/objects/pack/tmp_pack_KhzrrI warning: garbage found: .git/objects/pack/tmp_pack_mycfro warning: garbage found: .git/objects/pack/tmp_pack_2kxKOn count: 51609 size: 288768 in-pack: 23902336 packs: 1044 size-pack: 16588157 prune-packable: 48628 garbage: 4 size-garbage: 84792 There are some 20k refs on the remote, closer to 25k locally. I'll try to get a profile (without your patch) before doing a gc run. Vegard
Re: Huge performance bottleneck reading packs
On 10/13/2016 01:01 AM, Jeff King wrote: On Thu, Oct 13, 2016 at 12:30:52AM +0200, Vegard Nossum wrote: However, the commit found by 'git blame' above appears just fine to me, I haven't been able to spot a bug in it. A closer inspection reveals the problem to really be that this is an extremely hot path with more than -- holy cow -- 4,106,756,451 iterations on the 'packed_git' list for a single 'git fetch' on my repository. I'm guessing the patch above just made the inner loop ever so slightly slower. My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files, and 4 tmp_pack_* files). Yeah. I agree that the commit you found makes the check a little more expensive, but I think the root of the problem is calling prepare_packed_git_one many times. This _should_ happen once for each pack at program startup, and possibly again if we need to re-scan the pack directory to account for racing with a simultaneous repack. The latter is generally triggered when we fail to look up an object we expect to exist. So I'd suspect 45e8a74 (has_sha1_file: re-check pack directory before giving up, 2013-08-30) is playing a part. We dealt with that to some degree in 0eeb077 (index-pack: avoid excessive re-reading of pack directory, 2015-06-09), but it would not surprise me if there is another spot that needs similar treatment. Does the patch below help? Yes, ~2m10s -> ~1m25s when I test a git fetch this morning (the other variation in time may be due to different CPU usage by other programs, but I ran with/without the patch multiple times and the difference is consistent). Thanks, Vegard
Re: Huge performance bottleneck reading packs
On Wed, Oct 12, 2016 at 07:18:07PM -0400, Jeff King wrote: > Also, is it possible to make the repository in question available? I > might be able to reproduce based on your description, but it would save > time if I could directly run gdb on your example. I tried this by making a bunch of packs in linux.git (my standard "this is pretty big" repo), like so: for i in $(seq 1000); do git rev-list --objects HEAD~$((i+1))..HEAD~$i | git pack-objects --delta-base-offset .git/objects/pack/pack done and then doing a 25,000-object fetch from upstream (no significance to the number; that's just how far behind upstream I happened to be). However, I didn't notice any regression. In fact, it was much _slower_ than v1.9.0, because that older version didn't have threaded index-pack. If you can't share the repo directly, can you tell us more about your fetch? How many objects are in your repository? How many objects are fetched? How many refs are there on the remote side? -Peff
Re: Huge performance bottleneck reading packs
On Wed, Oct 12, 2016 at 07:01:43PM -0400, Jeff King wrote: > On Thu, Oct 13, 2016 at 12:30:52AM +0200, Vegard Nossum wrote: > > > However, the commit found by 'git blame' above appears just fine to me, > > I haven't been able to spot a bug in it. > > > > A closer inspection reveals the problem to really be that this is an > > extremely hot path with more than -- holy cow -- 4,106,756,451 > > iterations on the 'packed_git' list for a single 'git fetch' on my > > repository. I'm guessing the patch above just made the inner loop > > ever so slightly slower. > > > > My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files, > > and 4 tmp_pack_* files). > > Yeah. I agree that the commit you found makes the check a little more > expensive, but I think the root of the problem is calling > prepare_packed_git_one many times. This _should_ happen once for each > pack at program startup, and possibly again if we need to re-scan the > pack directory to account for racing with a simultaneous repack. > > The latter is generally triggered when we fail to look up an object we > expect to exist. So I'd suspect 45e8a74 (has_sha1_file: re-check pack > directory before giving up, 2013-08-30) is playing a part. We dealt with > that to some degree in 0eeb077 (index-pack: avoid excessive re-reading > of pack directory, 2015-06-09), but it would not surprise me if there is > another spot that needs similar treatment. > > Does the patch below help? Also, is it possible to make the repository in question available? I might be able to reproduce based on your description, but it would save time if I could directly run gdb on your example. Specifically, I'm interested in the call graph that leads to calling reprepare_packed_git(). -Peff
Re: Huge performance bottleneck reading packs
On Thu, Oct 13, 2016 at 12:30:52AM +0200, Vegard Nossum wrote: > However, the commit found by 'git blame' above appears just fine to me, > I haven't been able to spot a bug in it. > > A closer inspection reveals the problem to really be that this is an > extremely hot path with more than -- holy cow -- 4,106,756,451 > iterations on the 'packed_git' list for a single 'git fetch' on my > repository. I'm guessing the patch above just made the inner loop > ever so slightly slower. > > My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files, > and 4 tmp_pack_* files). Yeah. I agree that the commit you found makes the check a little more expensive, but I think the root of the problem is calling prepare_packed_git_one many times. This _should_ happen once for each pack at program startup, and possibly again if we need to re-scan the pack directory to account for racing with a simultaneous repack. The latter is generally triggered when we fail to look up an object we expect to exist. So I'd suspect 45e8a74 (has_sha1_file: re-check pack directory before giving up, 2013-08-30) is playing a part. We dealt with that to some degree in 0eeb077 (index-pack: avoid excessive re-reading of pack directory, 2015-06-09), but it would not surprise me if there is another spot that needs similar treatment. Does the patch below help? diff --git a/builtin/fetch.c b/builtin/fetch.c index d5329f9..c0f3c2c 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -243,7 +243,7 @@ static void find_non_local_tags(struct transport *transport, if (ends_with(ref->name, "^{}")) { if (item && !has_object_file(&ref->old_oid) && !will_fetch(head, ref->old_oid.hash) && - !has_sha1_file(item->util) && + !has_sha1_file_with_flags(item->util, HAS_SHA1_QUICK) && !will_fetch(head, item->util)) item->util = NULL; item = NULL; @@ -256,7 +256,8 @@ static void find_non_local_tags(struct transport *transport, * to check if it is a lightweight tag that we want to * fetch. */ - if (item && !has_sha1_file(item->util) && + if (item && + !has_sha1_file_with_flags(item->util, HAS_SHA1_QUICK) && !will_fetch(head, item->util)) item->util = NULL; @@ -276,7 +277,8 @@ static void find_non_local_tags(struct transport *transport, * We may have a final lightweight tag that needs to be * checked to see if it needs fetching. */ - if (item && !has_sha1_file(item->util) && + if (item && + !has_sha1_file_with_flags(item->util, HAS_SHA1_QUICK) && !will_fetch(head, item->util)) item->util = NULL; -Peff
Re: Huge performance bottleneck reading packs
Vegard Nossum writes: > A closer inspection reveals the problem to really be that this is an > extremely hot path with more than -- holy cow -- 4,106,756,451 > iterations on the 'packed_git' list for a single 'git fetch' on my > repository. I'm guessing the patch above just made the inner loop > ever so slightly slower. Very plausible, and this ... > My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files, > and 4 tmp_pack_* files). ... may explain why nobody else has seen a difference. Is there a reason why your repository has that many pack files? Is automatic GC not working for some reason? "gc" would try to make sure that you have reasonably low number of packs, as having too many packs is detrimental for performance for multiple reasons, including: * All objects in a single pack expressed in delta format (i.e. only the difference from another object is stored) must eventually have another object that its difference is based on recorded in the full format in the same packfile. * A single packfile records a single object only once, but it is normal (and often required because of the point above) that the same object appears in multiple packfiles. * Locating of objects from a single packfile uses its .idx file by binary search of sorted list of object names, which is efficient, but this cost is multiplied linearly as the number of packs you have in your repository.
Huge performance bottleneck reading packs
Hi all, I've bisected a performance regression (noticed by Quentin and myself) which caused a 'git fetch' to go from ~1m30s to ~2m40s: commit 47bf4b0fc52f3ad5823185a85f5f82325787c84b Author: Jeff King Date: Mon Jun 30 13:04:03 2014 -0400 prepare_packed_git_one: refactor duplicate-pack check Reverting this commit from a recent mainline master brings the time back down from ~2m24s to ~1m19s. The bisect log: v2.8.1 -- 2m41s, 2m50s (bad) v1.9.0 -- 1m39s, 1m46s (good) 2.3.4.312.gea1fd48 -- 2m40s 2.1.0.18.gc285171 -- 2m42s 2.0.0.140.g6753d8a -- 1m27s 2.0.1.480.g60e2f5a -- 1m34s 2.0.2.631.gad25da0 -- 2m39s 2.0.1.565.ge0a064a -- 1m30s 2.0.1.622.g2e42338 -- 2m29s 2.0.0.rc1.32.g5165dd5 -- 1m30s 2.0.1.607.g5418212 -- 1m32s 2.0.1.7.g6dda4e6 -- 1m28s 2.0.1.619.g6e40947 -- 2m25s 2.0.1.9.g47bf4b0 -- 2m18s 2.0.1.8.gd6cd00c -- 1m36.542s However, the commit found by 'git blame' above appears just fine to me, I haven't been able to spot a bug in it. A closer inspection reveals the problem to really be that this is an extremely hot path with more than -- holy cow -- 4,106,756,451 iterations on the 'packed_git' list for a single 'git fetch' on my repository. I'm guessing the patch above just made the inner loop ever so slightly slower. My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files, and 4 tmp_pack_* files). I am convinced that it is not necessary to rescan the entire pack directory 11,348 times or do all 4 _BILLION_ memcmp() calls for a single 'git fetch', even for a large repository like mine. I could try to write a patch to reduce the number of times we rescan the pack directory. However, I've never even looked at the file before today, so any hints regarding what would need to be done would be appreciated. Thanks, (Cced some people with changes in the area.) Vegard