Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
Stefan Saasen writes: > Anyway, long story short. We're interested to help but I'm not > entirely sure what that would look like at the moment. Are there > formed ideas floating around or would you be looking for some form of > proposal instead? I am not proposing anything or looking for proposals myself, actually. It is just somebody expressed interest in having tested older maintenance track that is kept alive in the past, so I was merely trying to help connect you with that old thread. If those who are interested in having such LTS track(s) need something specific from me, and if it will not be unrealistic maintenance burden, I am willing to help. That's all. For example, LTS group for whatever reason may nominate 2.2.x track as a base that they want to keep alive longer than other maintenance tracks and promise to test changes to them to keep it stable. Then I can help the effort by making sure people's bugfix patches would apply down to 2.2.x track (often people make mistake of using newer facility to fix or test the fix for an ancient bug, and bugfix topic branch ends up forked at a point much newer than where it should be). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
> Perhaps companies like Atlassian that rely on the stability of the > open source Git can spare some resources and join forces with like > minded folks on LTS of older maintenance tracks, if they are truly > interested in. We certainly can and would like to. I'm not entirely sure what that would entail though? >From reading through $gmane/264365 I've identified the following responsibilities/opportunities to help: >- Monitor "git log --first-parent maint-lts..master" and find > the tip of topic branches that need to be down-merged; > >- Down-merge such topics to maint-lts; this might involve > cherry-picking instead of merge, as the bugfix topics may > originally be done on the codebase newer than maint-lts; and more importantly testing the maint-lts version to ensure backported changes don't introduce regressions and the maint-lts branch is stable. This suggests specific, spaced LTS versions but in the same thread you mention maint-2.1or maint-2.2. So a different model could be maintaining old versions in a sliding window fashion (e.g. critical issues would be backported to the last 6 months worth of git releases). Maybe I'm getting ahead of myself here :) Anyway, long story short. We're interested to help but I'm not entirely sure what that would look like at the moment. Are there formed ideas floating around or would you be looking for some form of proposal instead? Cheers, Stefan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
>> I've noticed Peff's patches on pu which suggest they will be available >> in git 2.5? > > Being on 'pu' (or 'next' for that matter) is not a suggestion for a > change to appear in any future version at all, even though it often > means that it would soon be merged to 'master' and will be in the > upcoming release to be on 'next' in early part of a development > cycle. Some larger topics would stay on 'next' for a few cycles. > >> Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)? > > The topic will hopefully be merged to 'master' after 2.4 final is > released end of this month, down to 'maint' early May and will ship > with 2.4.1, unless there is unforeseen issues discovered in the > change while people try it out while it is in 'next' (which will > happen today, hopefully). Thanks for the clarification Junio. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
Junio C Hamano writes: > Stefan Saasen writes: > >> I've noticed Peff's patches on pu which suggest they will be available >> in git 2.5? > > Being on 'pu' (or 'next' for that matter) is not a suggestion for a > change to appear in any future version at all, even though it often > means that it would soon be merged to 'master' and will be in the > upcoming release to be on 'next' in early part of a development > cycle. Some larger topics would stay on 'next' for a few cycles. > >> Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)? > > The topic will hopefully be merged to 'master' after 2.4 final is > released end of this month, down to 'maint' early May and will ship > with 2.4.1, unless there is unforeseen issues discovered in the > change while people try it out while it is in 'next' (which will > happen today, hopefully). ... and then if I do not forget and if the topic is really important for real-world users, I am OK to merge it down to 2.3 and even 2.2 maintenance tracks later. But that will happen only after the topic hits 'maint', which will happen only after the topic hits 'master'. What you _can_ help is the "if I do not forget" part ;-) Also see a similar discussion we had recently http://thread.gmane.org/gmane.comp.version-control.git/264365 The key sentence from my part in the thread is > When I say "the tip of 'master' is meant to be more stable than > any tagged versions", I do mean it. and the reasoning behind it that is given in the paragraph before that, though. Perhaps companies like Atlassian that rely on the stability of the open source Git can spare some resources and join forces with like minded folks on LTS of older maintenance tracks, if they are truly interested in. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
Stefan Saasen writes: > I've noticed Peff's patches on pu which suggest they will be available > in git 2.5? Being on 'pu' (or 'next' for that matter) is not a suggestion for a change to appear in any future version at all, even though it often means that it would soon be merged to 'master' and will be in the upcoming release to be on 'next' in early part of a development cycle. Some larger topics would stay on 'next' for a few cycles. > Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)? The topic will hopefully be merged to 'master' after 2.4 final is released end of this month, down to 'maint' early May and will ship with 2.4.1, unless there is unforeseen issues discovered in the change while people try it out while it is in 'next' (which will happen today, hopefully). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
>> If it is critical to some people, they can downmerge to their custom >> old installations of Git they maintain with ease, of course, and >> that "with ease" part is the reason why I try to apply fixes to tip >> of the original topic branch even though they were merged to the >> mainline eons ago ;-). > > I think it is a bigger deal for folks who do not ship a custom > installation, but expect to ship a third-party system that interacts > with whatever version of git their customers happen to have (in which > case they can only recommend their customers to upgrade). Yes, this is the situation we are facing. We allow our customers to use the git version that is supported/available on their OS (within a certain range of supported versions) so our customers usually don't compile from source. > Either way, though, I do not think it is the upstream Git project's > problem. That's fair enough, I was mostly enquiring about the official git versions this will land in so that we can advise customers what git version to use (or not to use). I've noticed Peff's patches on pu which suggest they will be available in git 2.5? Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)? While I certainly agree that this is specific to Git on NFS and not a more widespread git performance problem, I'd love to be able to message something other than "skip all the git version between and including git 2.2 - 2.4". I appreciate your consideration and thanks again for the swift response on this. Cheers, Stefan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
On Mon, Apr 20, 2015 at 01:12:54PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > Either way, though, I do not think it is the upstream Git project's > > problem. > > The commit to pick where to queue the fixes actually is my problem, > as I have this illusion that I'd be helping these derived works by > making it easier for them to merge, not cherry-pick. True, I had just meant the actual rolling of the releases. > But I would imagine that they may go the cherry-pick route anyway, > in which case I may be wasting my time worrying about them X-<. FWIW, I typically cherry-pick rather than merge. The resulting history is not as nice, but it means I don't have to think as hard about the history when doing so. It also means that topics may not be as well tested (e.g., they may have been implicitly relying on some other thing that happened upstream that I did _not_ cherry-pick). But we treat even cherry-picked upstream topics as their own feature branches, and do our normal internal testing and review. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
Jeff King writes: > Either way, though, I do not think it is the upstream Git project's > problem. The commit to pick where to queue the fixes actually is my problem, as I have this illusion that I'd be helping these derived works by making it easier for them to merge, not cherry-pick. But I would imagine that they may go the cherry-pick route anyway, in which case I may be wasting my time worrying about them X-<. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
On Mon, Apr 20, 2015 at 01:04:11PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > ... But I don't know > > if this counts as critical (it is for you, certainly, but I don't think > > that many people are affected, as the crucial factor here is really the > > slow NFS filesystem operations). > > If it is critical to some people, they can downmerge to their custom > old installations of Git they maintain with ease, of course, and > that "with ease" part is the reason why I try to apply fixes to tip > of the original topic branch even though they were merged to the > mainline eons ago ;-). I think it is a bigger deal for folks who do not ship a custom installation, but expect to ship a third-party system that interacts with whatever version of git their customers happen to have (in which case they can only recommend their customers to upgrade). I don't know how Stash or GitLab installations work. GitHub ships our own custom git (which I maintain), though we are already on 2.3.x. Either way, though, I do not think it is the upstream Git project's problem. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
Jeff King writes: > ... But I don't know > if this counts as critical (it is for you, certainly, but I don't think > that many people are affected, as the crucial factor here is really the > slow NFS filesystem operations). If it is critical to some people, they can downmerge to their custom old installations of Git they maintain with ease, of course, and that "with ease" part is the reason why I try to apply fixes to tip of the original topic branch even though they were merged to the mainline eons ago ;-). Thanks. The patches look good from cursory reading. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
On Sat, Apr 18, 2015 at 01:35:51PM +1000, Stefan Saasen wrote: > Here are the timings for the two patches: > [...] Thanks, that matches what I was hoping for. > My tweaked version of your second patch is: > [...] > - return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name); > + if (!find_pack_entry(sha1, &e)) > + return 0; > + if (e.p->freshened) > + return 1; > + return e.p->freshened = freshen_file(e.p->pack_name); > } Whooops, yeah, setting the flag is probably helpful. :) We usually try to avoid assignments in a return like this, so I've written it out a little more verbosely in my final version. I'll send those patches in a moment. [1/2]: sha1_file: freshen pack objects before loose [2/2]: sha1_file: only freshen packs once per run > Is there a chance to backport those changes to the 2.2+ branches? That's up to Junio. These patches can be applied straight to the jk/prune-mtime topic. Usually he would then merge the topic up to "maint", which at this would potentially become the next v2.3.x. If an issue is critical (e.g., a security vulnerability), he'll sometimes merge and roll maintenance releases for older versions. But I don't know if this counts as critical (it is for you, certainly, but I don't think that many people are affected, as the crucial factor here is really the slow NFS filesystem operations). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
> If it's not a problem, I'd love to see timings for your case with just > the first patch, and then with both. Thanks for the swift response, much appreciated Jeff! Here are the timings for the two patches: Patch 1 on top of 33d4221c79 Elapsed System User Min. :6.110 Min. :0.6700 Min. :0.3600 1st Qu.:6.580 1st Qu.:0.6900 1st Qu.:0.3900 Median :7.260 Median :0.7100 Median :0.4100 Mean :7.347 Mean :0.7248 Mean :0.4214 3rd Qu.:8.000 3rd Qu.:0.7400 3rd Qu.:0.4600 Max. :8.860 Max. :0.8700 Max. :0.5100 I've had to slightly tweak your second patch (`freshened` was never set) but applying the modified patch yielded even better results compared to patch 1: Elapsed System User Min. :0.38 Min. :0.03000 Min. :0.2900 1st Qu.:0.38 1st Qu.:0.04000 1st Qu.:0.3100 Median :0.39 Median :0.06000 Median :0.3200 Mean :0.43 Mean :0.05667 Mean :0.3519 3rd Qu.:0.42 3rd Qu.:0.07000 3rd Qu.:0.3600 Max. :0.68 Max. :0.08000 Max. :0.5700 This is pretty much back to the "before" state. The graph really tells the whole story: https://bytebucket.org/snippets/ssaasen/GeRE/raw/7367353a58c50ccd7c493af40ffb6ba1533e1490/git-merge-timing-patched.png (After is the change in #33d4221, Before the parent of #33d4221 and so on) The graph and the NFS stats can be found here: https://bitbucket.org/snippets/ssaasen/GeRE My tweaked version of your second patch is: diff --git a/cache.h b/cache.h index 51ee856..8982055 100644 --- a/cache.h +++ b/cache.h @@ -1168,6 +1168,7 @@ extern struct packed_git { int pack_fd; unsigned pack_local:1, pack_keep:1, + freshened:1, do_not_close:1; unsigned char sha1[20]; /* something like ".git/objects/pack/x.pack" */ diff --git a/sha1_file.c b/sha1_file.c index bc6322e..c0ccd4b 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned char *sha1) static int freshen_packed_object(const unsigned char *sha1) { struct pack_entry e; - return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name); + if (!find_pack_entry(sha1, &e)) + return 0; + if (e.p->freshened) + return 1; + return e.p->freshened = freshen_file(e.p->pack_name); } int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1) The only change is that I assign the result of `freshen_file` to the `freshened` flag. I've only ran this with the test case I was using before but it looks like this is pretty much fixing the merge time changes we observed. Thanks again for the swift response. I've got my test setup sitting here, happy to rerun the tests if that'd be useful. Is there a chance to backport those changes to the 2.2+ branches? > You may also be interested in: > > http://thread.gmane.org/gmane.comp.version-control.git/266370 > > which addresses another performance problem related to the > freshen/recent code in v2.2. Thanks for the pointer, I'll have a look at that as well. Cheers, Stefan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
Jeff King writes: > If it's not a problem, I'd love to see timings for your case with just > the first patch, and then with both. Thanks for two quick progress patches. > You may also be interested in: > > http://thread.gmane.org/gmane.comp.version-control.git/266370 > > which addresses another performance problem related to the > freshen/recent code in v2.2. That, too. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
On Fri, Apr 17, 2015 at 05:30:22PM +1000, Stefan Saasen wrote: > The merge is created in a temporary location that uses alternates. The > temporary repository is on a local disk, the alternate object database > on an NFS mount. Is the alternate writeable? If we can't freshen the object, we fall back to storing the object locally, which could have a performance impact. But it looks from your tables below like the utime() call is succeeding, so that is probably not what is happening here. > My current hypothesis is that the additional `access`, but more > importantly the additional `utime` calls are responsible in the > increased merge times that we see. Yeah, that makes sense from your tables. The commit in question flips the order of the loose/packed check, and the packed check should be much faster on your NFS mount. Can you try: diff --git a/sha1_file.c b/sha1_file.c index 88f06ba..822aaef 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen); if (returnsha1) hashcpy(returnsha1, sha1); - if (freshen_loose_object(sha1) || freshen_packed_object(sha1)) + if (freshen_packed_object(sha1) || freshen_loose_object(sha1)) return 0; return write_loose_object(sha1, hdr, hdrlen, buf, len, 0); } I think that should clear up the access() calls, but leave the utime() ones. > Looking at the detailed strace shows that utime will be called > repeatedly in same cases (e.g. > https://bitbucket.org/snippets/ssaasen/oend shows an example where the > same packfile will be updated more than 4000 times in a single merge). > > http://www.spinics.net/lists/git/msg240106.html discusses a potential > improvement for this case. Would that be an acceptable avenue to > improve this situation? I think so. Here's a tentative patch: diff --git a/cache.h b/cache.h index 3d3244b..72c6888 100644 --- a/cache.h +++ b/cache.h @@ -1174,6 +1174,7 @@ extern struct packed_git { int pack_fd; unsigned pack_local:1, pack_keep:1, +freshened:1, do_not_close:1; unsigned char sha1[20]; /* something like ".git/objects/pack/x.pack" */ diff --git a/sha1_file.c b/sha1_file.c index 822aaef..f27cbf1 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned char *sha1) static int freshen_packed_object(const unsigned char *sha1) { struct pack_entry e; - return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name); + if (!find_pack_entry(sha1, &e)) + return 0; + if (e.p->freshened) + return 1; + return freshen_file(e.p->pack_name); } int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1) If it's not a problem, I'd love to see timings for your case with just the first patch, and then with both. You may also be interested in: http://thread.gmane.org/gmane.comp.version-control.git/266370 which addresses another performance problem related to the freshen/recent code in v2.2. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects
We became aware of slow merge times with the following setup: The merge is created in a temporary location that uses alternates. The temporary repository is on a local disk, the alternate object database on an NFS mount. After some investigation we believe that #33d4221 (present in git 2.2.0, absent in 2.1.4) is causing this regression in merge time. The following are merge times (in seconds) with git@33d4221~ (2.1.2-393-gabcb865) (before the change) Elapsed SystemUser Min. :0.3700 Min. :0.04000 Min. :0.3000 1st Qu.:0.3800 1st Qu.:0.05000 1st Qu.:0.3100 Median :0.4000 Median :0.06000 Median :0.3300 Mean :0.4295 Mean :0.05905 Mean :0.3519 3rd Qu.:0.4600 3rd Qu.:0.07000 3rd Qu.:0.3600 Max. :0.5900 Max. :0.09000 Max. :0.4900 The following are merge times with git@33d4221 (2.1.2-394-g33d4221): Elapsed SystemUser Min. : 8.58 Min. :1.46 Min. :0.4400 1st Qu.: 9.63 1st Qu.:1.60 1st Qu.:0.4400 Median :10.64 Median :1.66 Median :0.4800 Mean :10.50 Mean :1.69 Mean :0.4986 3rd Qu.:11.13 3rd Qu.:1.81 3rd Qu.:0.5000 Max. :13.96 Max. :2.05 Max. :0.6700 As you can see the merge times are an order of magnitude slower after the change. The effect of #33d4221 can be seen when strace'ing the merge: Running strace on git@33d4221 yields % time seconds usecs/call callserrors syscall -- --- --- - - 70.790.714852 178 4018 utime 14.730.148789 3 50141 50123 lstat 13.880.140198 17 8074 8067 access 0.240.002455 614 4 rename 0.150.001493 3 577 write 0.060.000618 1065 close 0.040.000453 3 152 brk 0.040.000433 2716 mkdir 0.030.000310 841 fstat Running strace on git@33d4221~ yields % time seconds usecs/call callserrors syscall -- --- --- - - 98.370.138516 3 50141 50123 lstat 0.920.001292 2 577 write 0.370.000520 143831 access 0.180.000252 36 7 getcwd 0.170.000237 73620 stat 0.000.00 040 read 0.000.00 08730 open My current hypothesis is that the additional `access`, but more importantly the additional `utime` calls are responsible in the increased merge times that we see. NFS stats on the server for the tests seem to confirm this (see nfsstat-{after,before}-change.txt on https://bitbucket.org/snippets/ssaasen/oend). This is certainly due to the fact that this will all happen over NFS but in 2.1.4 this worked fine and starting with 2.2 this has become very slow. Looking at the detailed strace shows that utime will be called repeatedly in same cases (e.g. https://bitbucket.org/snippets/ssaasen/oend shows an example where the same packfile will be updated more than 4000 times in a single merge). http://www.spinics.net/lists/git/msg240106.html discusses a potential improvement for this case. Would that be an acceptable avenue to improve this situation? Best regards, Stefan Saasen -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html