Re: Flurries of 'git reflog expire'
On Tue, Jul 11, 2017 at 12:35:50AM -0700, Bryan Turner wrote: > That's a few of the reasons we've switched over. I'd imagine most > hosting providers take a similarly "hands on" approach to controlling > their GC. Beyond a certain scale, it seems almost unavoidable. Git > never has more than a repository-level view of the world; only the > hosting provider can see the big picture. Thanks for writing this out. I agree with all of the reasons given (in my email which I suspect crossed with yours, I just said "throttling", but there really are a lot of other reasons). -Peff
Re: Flurries of 'git reflog expire'
On Mon, Jul 10, 2017 at 9:45 PM, Andreas Kreywrote: > On Thu, 06 Jul 2017 10:01:05 +, Bryan Turner wrote: > >> I also want to add that Bitbucket Server 5.x includes totally >> rewritten GC handling. 5.0.x automatically disables auto GC in all >> repositories and manages it explicitly, and 5.1.x fully removes use of >> "git gc" in favor of running relevant plumbing commands directly. > > That's the part that irks me. This shouldn't be necessary - git itself > should make sure auto GC isn't run in parallel. Now I probably can't > evaluate whether a git upgrade would fix this, but given that you > are going the do-gc-ourselves route I suppose it wouldn't. > I believe I've seen some commits on the mailing list that suggest "git gc --auto" manages its concurrency better in newer versions than it used to, but even then it can only manage its concurrency within a single repository. For a hosting server with thousands, or tens of thousands, of active repositories, there still wouldn't be any protection against "git gc --auto" running concurrently in dozens of them at the same time. But it's not only about concurrency. "git gc" (and by extension "git gc --auto") is a general purpose tool, designed to generally do what you need, and to mostly stay out of your way while it does it. I'd hazard to say it's not really designed for managing heavily-trafficked repositories on busy hosting services, though, and as a result, there are things it can't do. For example, I can configure auto GC to run based on how many loose objects or packs I have, but there's no heuristic to make it repack refs when I have a lot of loose ones, or configure it to _only_ pack refs without repacking objects or pruning reflogs. There are knobs for various things (like "gc.*.reflogExpire"), but those don't give complete control. Even if I set "gc.reflogExpire=never", "git gc" still forks "git reflog expire --all" (compared to "gc.packRefs=false", which completely prevents forking "git pack-refs"). A trace on "git gc" shows this: $ GIT_TRACE=1 git gc 00:10:45.058066 git.c:437 trace: built-in: git 'gc' 00:10:45.067075 run-command.c:369 trace: run_command: 'pack-refs' '--all' '--prune' 00:10:45.077086 git.c:437 trace: built-in: git 'pack-refs' '--all' '--prune' 00:10:45.084098 run-command.c:369 trace: run_command: 'reflog' 'expire' '--all' 00:10:45.093102 git.c:437 trace: built-in: git 'reflog' 'expire' '--all' 00:10:45.097088 run-command.c:369 trace: run_command: 'repack' '-d' '-l' '-A' '--unpack-unreachable=2.weeks.ago' 00:10:45.106096 git.c:437 trace: built-in: git 'repack' '-d' '-l' '-A' '--unpack-unreachable=2.weeks.ago' 00:10:45.107098 run-command.c:369 trace: run_command: 'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty' '--all' '--reflog' '--indexed-objects' '--unpack-unreachable=2.weeks.ago' '--local' '--delta-base-offset' 'objects/pack/.tmp-15212-pack' 00:10:45.127117 git.c:437 trace: built-in: git 'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty' '--all' '--reflog' '--indexed-objects' '--unpack-unreachable=2.weeks.ago' '--local' '--delta-base-offset' 'objects/pack/.tmp-15212-pack' Counting objects: 6, done. Delta compression using up to 16 threads. Compressing objects: 100% (2/2), done. Writing objects: 100% (6/6), done. Total 6 (delta 0), reused 6 (delta 0) 00:10:45.173161 run-command.c:369 trace: run_command: 'prune' '--expire' '2.weeks.ago' 00:10:45.184171 git.c:437 trace: built-in: git 'prune' '--expire' '2.weeks.ago' 00:10:45.199202 run-command.c:369 trace: run_command: 'worktree' 'prune' '--expire' '3.months.ago' 00:10:45.208193 git.c:437 trace: built-in: git 'worktree' 'prune' '--expire' '3.months.ago' 00:10:45.212198 run-command.c:369 trace: run_command: 'rerere' 'gc' 00:10:45.221223 git.c:437 trace: built-in: git 'rerere' 'gc' The bare repositories used by Bitbucket Server: * Don't have reflogs enabled generally, and for the ones that are enabled "gc.*.reflogExpire" is set to "never" * Never have worktrees, so they don't need to be pruned * Never use rerere, so that doesn't need to GC * Have pruning disabled if they've been forked, due to using alternates to manage disk space That means of all the commands "git gc" runs, under the covers, at most only "pack-refs", "repack" and sometimes "prune" have any value. "reflog expire --all" in particular is extremely likely to fail. Which brings up another consideration. "git gc --auto" has no sense of context, or adjacent behavior. Even if it correctly guards against concurrency, it still doesn't know what else is going on. Immediately after a push, Bitbucket Server has many other housekeeping tasks it performs, especially around pull requests. That means pull request refs are disproportionately likely to be "moving" immediately after a push completes--exactly
Re: Flurries of 'git reflog expire'
On Thu, Jul 06, 2017 at 10:01:05AM -0700, Bryan Turner wrote: > I also want to add that Bitbucket Server 5.x includes totally > rewritten GC handling. 5.0.x automatically disables auto GC in all > repositories and manages it explicitly, and 5.1.x fully removes use of > "git gc" in favor of running relevant plumbing commands directly. We > moved away from "git gc" specifically to avoid the "git reflog expire > --all", because there's no config setting that _fully disables_ > forking that process. FWIW, I think auto-gc in general is not a good way to handle maintenance on a busy hosting server. Repacking can be very resource hungry (both CPU and memory), and it needs to be throttled. You _could_ throttle with an auto-gc hook, but that isn't very elegant when it comes to re-queueing jobs which fail or timeout. The right model IMHO (and what GitHub uses, and what I'm guessing Bitbucket is doing in more recent versions) is to make note of write operations in a data structure, then use that data to schedule maintenance in a job queue. But that can never really be part of Git itself, as the notion of a system job queue is outside its scope. -Peff
[BUG] detached auto-gc does not respect lock for 'reflog expire', was Re: Flurries of 'git reflog expire'
[Updating the subject since I think this really is a bug]. On Tue, Jul 11, 2017 at 06:45:53AM +0200, Andreas Krey wrote: > > I also want to add that Bitbucket Server 5.x includes totally > > rewritten GC handling. 5.0.x automatically disables auto GC in all > > repositories and manages it explicitly, and 5.1.x fully removes use of > > "git gc" in favor of running relevant plumbing commands directly. > > That's the part that irks me. This shouldn't be necessary - git itself > should make sure auto GC isn't run in parallel. Now I probably can't > evaluate whether a git upgrade would fix this, but given that you > are going the do-gc-ourselves route I suppose it wouldn't. It's _supposed_ to take a lock, even in older versions. See 64a99eb47 (gc: reject if another gc is running, unless --force is given, 2013-08-08). But it looks like before we take that lock, we sometimes run pack-refs and reflog expire. This is due to 62aad1849 (gc --auto: do not lock refs in the background, 2014-05-25). IMHO this is buggy; it should be checking the lock before calling gc_before_repack() and daemonizing. Annoyingly, the lock code interacts badly with daemonizing because that latter will fork to a new process. So the simple solution like: diff --git a/builtin/gc.c b/builtin/gc.c index 2ba50a287..79480124a 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -414,6 +414,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix) if (report_last_gc_error()) return -1; + if (lock_repo_for_gc(force, )) + return 0; + if (gc_before_repack()) return -1; /* means that anybody looking at the lockfile will report the wrong pid (and thus think the lock is invalid). I guess we'd need to update it in place after daemonizing. -Peff
Re: Flurries of 'git reflog expire'
On Thu, 06 Jul 2017 10:01:05 +, Bryan Turner wrote: > Do you know what version of Bitbucket Server is in use? We're on the newest 4.x. ... > - Run "git config gc.auto 0" in Going that route. ... > I also want to add that Bitbucket Server 5.x includes totally > rewritten GC handling. 5.0.x automatically disables auto GC in all > repositories and manages it explicitly, and 5.1.x fully removes use of > "git gc" in favor of running relevant plumbing commands directly. That's the part that irks me. This shouldn't be necessary - git itself should make sure auto GC isn't run in parallel. Now I probably can't evaluate whether a git upgrade would fix this, but given that you are going the do-gc-ourselves route I suppose it wouldn't. ... > Upgrading to 5.x can be a bit of an undertaking, since the major > version brings API changes, The upgrade is on my todo list, but there are plugins that don't appear to be ready for 5.0, notable the jenkins one. Andreas -- "Totally trivial. Famous last words." From: Linus TorvaldsDate: Fri, 22 Jan 2010 07:29:21 -0800
Re: Flurries of 'git reflog expire'
I'm one of the Bitbucket Server developers. My apologies; I just noticed this thread or I would have jumped in sooner! On Thu, Jul 6, 2017 at 6:31 AM, Andreas Kreywrote: > On Wed, 05 Jul 2017 04:20:27 +, Jeff King wrote: >> On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote: > ... >> And what does the process tree look like? > > Lots (~ 10) of > > \_ /usr/bin/git receive-pack > /opt/apps/atlassian/bitbucket-data/shared/data/repositories/68 > | \_ git gc --auto --quiet > | \_ git reflog expire --all > > plus another dozen gc/expire pairs where the parent is already gone. > All with the same arguments - auto GC. Do you know what version of Bitbucket Server is in use? Based on the fact that it's "git gc --auto" triggered from a "git receive-pack", that implies two things: - You're on a 4.x version of Bitbucket Server - The repository (68) has never been forked Depending on your Bitbucket Server version (this being the reason I asked), there are a couple different fixes available: - Fork the repository. You don't need to _use_ the fork, but having a fork existing will trigger Bitbucket Server to disable auto GC and fully manage that itself. That includes managing both _concurrency_ and _frequency_ of GC. This works on all versions of Bitbucket Server. - Run "git config gc.auto 0" in /opt/apps/atlassian/bitbucket-data/shared/data/repositories/68 to disable auto GC yourself. This may be preferable to forking the repository, which, in addition to disabling auto GC, also disables object pruning. However, you must be running at least Bitbucket Server 4.6.0 for this approach to work. Otherwise auto GC will simply be reenabled the first time Bitbucket Server goes to trigger GC, when it detects that the repository has no forks. Assuming you're on 4.6.0 or newer, either approach should fix the issue. If you're on 4.5 or older, forking is the only viable approach unless you upgrade Bitbucket Server first. I also want to add that Bitbucket Server 5.x includes totally rewritten GC handling. 5.0.x automatically disables auto GC in all repositories and manages it explicitly, and 5.1.x fully removes use of "git gc" in favor of running relevant plumbing commands directly. We moved away from "git gc" specifically to avoid the "git reflog expire --all", because there's no config setting that _fully disables_ forking that process. By default our bare clones only have reflogs for pull request refs, and we've explicitly configured those to never expire, so all "git reflog expire --all" can do is use up I/O and, quite frequently, fail because refs are updated. Since we stopped running "git gc", we've not yet seen any GC failures on our internal Bitbucket Server clusters. Bitbucket Server 5.1.x also includes a new "gc.log" (not to be confused with the one Git itself writes) which retains a record of every GC-related process we run in each repository, and how long that process took to complete. That can be useful for getting clearer insight into both how often GC work is being done, and how long it's taking. Upgrading to 5.x can be a bit of an undertaking, since the major version brings API changes, so it's totally understandable that many organizations haven't upgraded yet. I'm just noting that these improvements are there for when such an upgrade becomes viable. Hope this helps! Bryan > > I'd wager that each push sees that a GC is in order, > and doesn't notice that there is one already running. > > - Andreas > > -- > "Totally trivial. Famous last words." > From: Linus Torvalds > Date: Fri, 22 Jan 2010 07:29:21 -0800
Re: Flurries of 'git reflog expire'
On Wed, 05 Jul 2017 04:20:27 +, Jeff King wrote: > On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote: ... > I seem to recall that using --stale-fix is also extremely expensive, > too. What do the command line arguments for the slow commands look like? The problem isn't that the expire is slow, it is that there are many of them, waiting for disk writes. > And what does the process tree look like? Lots (~ 10) of \_ /usr/bin/git receive-pack /opt/apps/atlassian/bitbucket-data/shared/data/repositories/68 | \_ git gc --auto --quiet | \_ git reflog expire --all plus another dozen gc/expire pairs where the parent is already gone. All with the same arguments - auto GC. I'd wager that each push sees that a GC is in order, and doesn't notice that there is one already running. - Andreas -- "Totally trivial. Famous last words." From: Linus TorvaldsDate: Fri, 22 Jan 2010 07:29:21 -0800
Re: Flurries of 'git reflog expire'
On Tue, 04 Jul 2017 11:43:33 +, Ævar Arnfjörð Bjarmason wrote: ... > You can set gc.auto=0 in the repo to disable auto-gc, and play with > e.g. the reflog expire values, see the git-gc manpage. > > But then you need to run your own gc, which is not a bad idea anyway > with a dedicated git server. Actually, bitbucket should be doing this. Although I can't quite rule out the possibility that we reenabled GC in this repo some time ago. > But it would be good to get to the bottom of this, we shouldn't be > running these concurrently. Indeed. Unfortunately this isn't easily reproduced in the test instance, so I will need to get a newer git under the production bitbucket. There are quite some of \_ /usr/bin/git receive-pack /opt/apps/atlassian/bitbucket-data/shared/data/repositories/68 | \_ git gc --auto --quiet | \_ git reflog expire --all in the process tree, apparently a new one gets started even though previous ones are still running. The number of running expires grew slowly, in the order of many minutes. Andreas -- "Totally trivial. Famous last words." From: Linus TorvaldsDate: Fri, 22 Jan 2010 07:29:21 -0800
Re: Flurries of 'git reflog expire'
On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote: > Questions: > > What can be done about this? Cronjob 'git reflog expire' at midnight, > so the heuristic don't trigger during the day? (The relnotes don't > mention anything after 2.4.0, so I suppose a git upgrade won't help.) > > What is the actual cause? Bad heuristics in git itself, or does > bitbucket run them too often (improbable)? If it's using --expire-unreachable (which a default "git gc" does), that means the we have to traverse the entire history to see what is reachable and what is not. Added on to a normal git-gc, that's usually not a big deal (it has to do that traversal and much more for the repack). But if bitbucket is triggering it for other operations, that could be related (I don't think anything but gc should ever run it otherwise). I seem to recall that using --stale-fix is also extremely expensive, too. What do the command line arguments for the slow commands look like? And what does the process tree look like? -Peff
Re: Flurries of 'git reflog expire'
On Tue, Jul 04 2017, Andreas Krey jotted: > Hi everyone, > > how is 'git reflog expire' triggered? We're occasionally seeing a lot > of the running in parallel on a single of our repos (atlassian bitbucket), > and this usually means that the machine is not very responsive for > twenty minutes, the repo being as big as it is. Assuming Linux, what does 'ps auxf' look like when this happens? Is the parent a 'git gc --auto'? > The server is still on git 2.6.2 (and bitbucket 4.14.5). You might want to upgrade, we've had a bunch of changes since then, maybe some of this fixes it: git log --reverse -p -L'/^static.*lock_repo_for/,/^}/:builtin/gc.c' > Questions: > > What can be done about this? Cronjob 'git reflog expire' at midnight, > so the heuristic don't trigger during the day? (The relnotes don't > mention anything after 2.4.0, so I suppose a git upgrade won't help.) > > What is the actual cause? Bad heuristics in git itself, or does > bitbucket run them too often (improbable)? You can set gc.auto=0 in the repo to disable auto-gc, and play with e.g. the reflog expire values, see the git-gc manpage. But then you need to run your own gc, which is not a bad idea anyway with a dedicated git server. But it would be good to get to the bottom of this, we shouldn't be running these concurrently.
Flurries of 'git reflog expire'
Hi everyone, how is 'git reflog expire' triggered? We're occasionally seeing a lot of the running in parallel on a single of our repos (atlassian bitbucket), and this usually means that the machine is not very responsive for twenty minutes, the repo being as big as it is. The server is still on git 2.6.2 (and bitbucket 4.14.5). Questions: What can be done about this? Cronjob 'git reflog expire' at midnight, so the heuristic don't trigger during the day? (The relnotes don't mention anything after 2.4.0, so I suppose a git upgrade won't help.) What is the actual cause? Bad heuristics in git itself, or does bitbucket run them too often (improbable)? Andreas -- "Totally trivial. Famous last words." From: Linus TorvaldsDate: Fri, 22 Jan 2010 07:29:21 -0800