Re: Flurries of 'git reflog expire'

2017-07-11 Thread Jeff King
On Tue, Jul 11, 2017 at 12:35:50AM -0700, Bryan Turner wrote:

> That's a few of the reasons we've switched over. I'd imagine most
> hosting providers take a similarly "hands on" approach to controlling
> their GC. Beyond a certain scale, it seems almost unavoidable. Git
> never has more than a repository-level view of the world; only the
> hosting provider can see the big picture.

Thanks for writing this out. I agree with all of the reasons given (in
my email which I suspect crossed with yours, I just said "throttling",
but there really are a lot of other reasons).

-Peff


Re: Flurries of 'git reflog expire'

2017-07-11 Thread Bryan Turner
On Mon, Jul 10, 2017 at 9:45 PM, Andreas Krey  wrote:
> On Thu, 06 Jul 2017 10:01:05 +, Bryan Turner wrote:
> 
>> I also want to add that Bitbucket Server 5.x includes totally
>> rewritten GC handling. 5.0.x automatically disables auto GC in all
>> repositories and manages it explicitly, and 5.1.x fully removes use of
>> "git gc" in favor of running relevant plumbing commands directly.
>
> That's the part that irks me. This shouldn't be necessary - git itself
> should make sure auto GC isn't run in parallel. Now I probably can't
> evaluate whether a git upgrade would fix this, but given that you
> are going the do-gc-ourselves route I suppose it wouldn't.
>

I believe I've seen some commits on the mailing list that suggest "git
gc --auto" manages its concurrency better in newer versions than it
used to, but even then it can only manage its concurrency within a
single repository. For a hosting server with thousands, or tens of
thousands, of active repositories, there still wouldn't be any
protection against "git gc --auto" running concurrently in dozens of
them at the same time.

But it's not only about concurrency. "git gc" (and by extension "git
gc --auto") is a general purpose tool, designed to generally do what
you need, and to mostly stay out of your way while it does it. I'd
hazard to say it's not really designed for managing heavily-trafficked
repositories on busy hosting services, though, and as a result, there
are things it can't do.

For example, I can configure auto GC to run based on how many loose
objects or packs I have, but there's no heuristic to make it repack
refs when I have a lot of loose ones, or configure it to _only_ pack
refs without repacking objects or pruning reflogs. There are knobs for
various things (like "gc.*.reflogExpire"), but those don't give
complete control. Even if I set "gc.reflogExpire=never", "git gc"
still forks "git reflog expire --all" (compared to
"gc.packRefs=false", which completely prevents forking "git
pack-refs").

A trace on "git gc" shows this:
$ GIT_TRACE=1 git gc
00:10:45.058066 git.c:437   trace: built-in: git 'gc'
00:10:45.067075 run-command.c:369   trace: run_command:
'pack-refs' '--all' '--prune'
00:10:45.077086 git.c:437   trace: built-in: git
'pack-refs' '--all' '--prune'
00:10:45.084098 run-command.c:369   trace: run_command: 'reflog'
'expire' '--all'
00:10:45.093102 git.c:437   trace: built-in: git 'reflog'
'expire' '--all'
00:10:45.097088 run-command.c:369   trace: run_command: 'repack'
'-d' '-l' '-A' '--unpack-unreachable=2.weeks.ago'
00:10:45.106096 git.c:437   trace: built-in: git 'repack'
'-d' '-l' '-A' '--unpack-unreachable=2.weeks.ago'
00:10:45.107098 run-command.c:369   trace: run_command:
'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty'
'--all' '--reflog' '--indexed-objects'
'--unpack-unreachable=2.weeks.ago' '--local' '--delta-base-offset'
'objects/pack/.tmp-15212-pack'
00:10:45.127117 git.c:437   trace: built-in: git
'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty'
'--all' '--reflog' '--indexed-objects'
'--unpack-unreachable=2.weeks.ago' '--local' '--delta-base-offset'
'objects/pack/.tmp-15212-pack'
Counting objects: 6, done.
Delta compression using up to 16 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (6/6), done.
Total 6 (delta 0), reused 6 (delta 0)
00:10:45.173161 run-command.c:369   trace: run_command: 'prune'
'--expire' '2.weeks.ago'
00:10:45.184171 git.c:437   trace: built-in: git 'prune'
'--expire' '2.weeks.ago'
00:10:45.199202 run-command.c:369   trace: run_command: 'worktree'
'prune' '--expire' '3.months.ago'
00:10:45.208193 git.c:437   trace: built-in: git
'worktree' 'prune' '--expire' '3.months.ago'
00:10:45.212198 run-command.c:369   trace: run_command: 'rerere' 'gc'
00:10:45.221223 git.c:437   trace: built-in: git 'rerere' 'gc'

The bare repositories used by Bitbucket Server:
* Don't have reflogs enabled generally, and for the ones that are
enabled "gc.*.reflogExpire" is set to "never"
* Never have worktrees, so they don't need to be pruned
* Never use rerere, so that doesn't need to GC
* Have pruning disabled if they've been forked, due to using
alternates to manage disk space

That means of all the commands "git gc" runs, under the covers, at
most only "pack-refs", "repack" and sometimes "prune" have any value.
"reflog expire --all" in particular is extremely likely to fail. Which
brings up another consideration.

"git gc --auto" has no sense of context, or adjacent behavior. Even if
it correctly guards against concurrency, it still doesn't know what
else is going on. Immediately after a push, Bitbucket Server has many
other housekeeping tasks it performs, especially around pull requests.
That means pull request refs are disproportionately likely to be
"moving" immediately after a push completes--exactly 

Re: Flurries of 'git reflog expire'

2017-07-11 Thread Jeff King
On Thu, Jul 06, 2017 at 10:01:05AM -0700, Bryan Turner wrote:

> I also want to add that Bitbucket Server 5.x includes totally
> rewritten GC handling. 5.0.x automatically disables auto GC in all
> repositories and manages it explicitly, and 5.1.x fully removes use of
> "git gc" in favor of running relevant plumbing commands directly. We
> moved away from "git gc" specifically to avoid the "git reflog expire
> --all", because there's no config setting that _fully disables_
> forking that process.

FWIW, I think auto-gc in general is not a good way to handle maintenance
on a busy hosting server. Repacking can be very resource hungry (both
CPU and memory), and it needs to be throttled. You _could_ throttle with
an auto-gc hook, but that isn't very elegant when it comes to
re-queueing jobs which fail or timeout.

The right model IMHO (and what GitHub uses, and what I'm guessing
Bitbucket is doing in more recent versions) is to make note of write
operations in a data structure, then use that data to schedule
maintenance in a job queue. But that can never really be part of Git
itself, as the notion of a system job queue is outside its scope.

-Peff


[BUG] detached auto-gc does not respect lock for 'reflog expire', was Re: Flurries of 'git reflog expire'

2017-07-11 Thread Jeff King
[Updating the subject since I think this really is a bug].

On Tue, Jul 11, 2017 at 06:45:53AM +0200, Andreas Krey wrote:

> > I also want to add that Bitbucket Server 5.x includes totally
> > rewritten GC handling. 5.0.x automatically disables auto GC in all
> > repositories and manages it explicitly, and 5.1.x fully removes use of
> > "git gc" in favor of running relevant plumbing commands directly.
> 
> That's the part that irks me. This shouldn't be necessary - git itself
> should make sure auto GC isn't run in parallel. Now I probably can't
> evaluate whether a git upgrade would fix this, but given that you
> are going the do-gc-ourselves route I suppose it wouldn't.

It's _supposed_ to take a lock, even in older versions. See 64a99eb47
(gc: reject if another gc is running, unless --force is given,
2013-08-08).

But it looks like before we take that lock, we sometimes run pack-refs
and reflog expire. This is due to 62aad1849 (gc --auto: do not lock refs
in the background, 2014-05-25). IMHO this is buggy; it should be
checking the lock before calling gc_before_repack() and daemonizing.

Annoyingly, the lock code interacts badly with daemonizing because that
latter will fork to a new process. So the simple solution like:

diff --git a/builtin/gc.c b/builtin/gc.c
index 2ba50a287..79480124a 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -414,6 +414,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
if (report_last_gc_error())
return -1;
 
+   if (lock_repo_for_gc(force, ))
+   return 0;
+
if (gc_before_repack())
return -1;
/*

means that anybody looking at the lockfile will report the wrong pid
(and thus think the lock is invalid). I guess we'd need to update it in
place after daemonizing.

-Peff


Re: Flurries of 'git reflog expire'

2017-07-10 Thread Andreas Krey
On Thu, 06 Jul 2017 10:01:05 +, Bryan Turner wrote:

> Do you know what version of Bitbucket Server is in use?

We're on the newest 4.x.

...
> - Run "git config gc.auto 0" in

Going that route.

...
> I also want to add that Bitbucket Server 5.x includes totally
> rewritten GC handling. 5.0.x automatically disables auto GC in all
> repositories and manages it explicitly, and 5.1.x fully removes use of
> "git gc" in favor of running relevant plumbing commands directly.

That's the part that irks me. This shouldn't be necessary - git itself
should make sure auto GC isn't run in parallel. Now I probably can't
evaluate whether a git upgrade would fix this, but given that you
are going the do-gc-ourselves route I suppose it wouldn't.

...
> Upgrading to 5.x can be a bit of an undertaking, since the major
> version brings API changes,

The upgrade is on my todo list, but there are plugins that don't
appear to be ready for 5.0, notable the jenkins one.

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds 
Date: Fri, 22 Jan 2010 07:29:21 -0800


Re: Flurries of 'git reflog expire'

2017-07-06 Thread Bryan Turner
I'm one of the Bitbucket Server developers. My apologies; I just
noticed this thread or I would have jumped in sooner!

On Thu, Jul 6, 2017 at 6:31 AM, Andreas Krey  wrote:
> On Wed, 05 Jul 2017 04:20:27 +, Jeff King wrote:
>> On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote:
> ...
>> And what does the process tree look like?
>
> Lots (~ 10) of
>
>   \_ /usr/bin/git receive-pack 
> /opt/apps/atlassian/bitbucket-data/shared/data/repositories/68
>   |   \_ git gc --auto --quiet
>   |   \_ git reflog expire --all
>
> plus another dozen gc/expire pairs where the parent is already gone.
> All with the same arguments - auto GC.

Do you know what version of Bitbucket Server is in use? Based on the
fact that it's "git gc --auto" triggered from a "git receive-pack",
that implies two things:
- You're on a 4.x version of Bitbucket Server
- The repository (68) has never been forked

Depending on your Bitbucket Server version (this being the reason I
asked), there are a couple different fixes available:

- Fork the repository. You don't need to _use_ the fork, but having a
fork existing will trigger Bitbucket Server to disable auto GC and
fully manage that itself. That includes managing both _concurrency_
and _frequency_ of GC. This works on all versions of Bitbucket Server.

- Run "git config gc.auto 0" in
/opt/apps/atlassian/bitbucket-data/shared/data/repositories/68 to
disable auto GC yourself. This may be preferable to forking the
repository, which, in addition to disabling auto GC, also disables
object pruning. However, you must be running at least Bitbucket Server
4.6.0 for this approach to work. Otherwise auto GC will simply be
reenabled the first time Bitbucket Server goes to trigger GC, when it
detects that the repository has no forks.

Assuming you're on 4.6.0 or newer, either approach should fix the
issue. If you're on 4.5 or older, forking is the only viable approach
unless you upgrade Bitbucket Server first.

I also want to add that Bitbucket Server 5.x includes totally
rewritten GC handling. 5.0.x automatically disables auto GC in all
repositories and manages it explicitly, and 5.1.x fully removes use of
"git gc" in favor of running relevant plumbing commands directly. We
moved away from "git gc" specifically to avoid the "git reflog expire
--all", because there's no config setting that _fully disables_
forking that process. By default our bare clones only have reflogs for
pull request refs, and we've explicitly configured those to never
expire, so all "git reflog expire --all" can do is use up I/O and,
quite frequently, fail because refs are updated. Since we stopped
running "git gc", we've not yet seen any GC failures on our internal
Bitbucket Server clusters.

Bitbucket Server 5.1.x also includes a new "gc.log" (not to be
confused with the one Git itself writes) which retains a record of
every GC-related process we run in each repository, and how long that
process took to complete. That can be useful for getting clearer
insight into both how often GC work is being done, and how long it's
taking.

Upgrading to 5.x can be a bit of an undertaking, since the major
version brings API changes, so it's totally understandable that many
organizations haven't upgraded yet. I'm just noting that these
improvements are there for when such an upgrade becomes viable.

Hope this helps!
Bryan

>
> I'd wager that each push sees that a GC is in order,
> and doesn't notice that there is one already running.
>
> - Andreas
>
> --
> "Totally trivial. Famous last words."
> From: Linus Torvalds 
> Date: Fri, 22 Jan 2010 07:29:21 -0800


Re: Flurries of 'git reflog expire'

2017-07-06 Thread Andreas Krey
On Wed, 05 Jul 2017 04:20:27 +, Jeff King wrote:
> On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote:
...
> I seem to recall that using --stale-fix is also extremely expensive,
> too. What do the command line arguments for the slow commands look like?

The problem isn't that the expire is slow, it is that there are
many of them, waiting for disk writes.

> And what does the process tree look like?

Lots (~ 10) of

  \_ /usr/bin/git receive-pack 
/opt/apps/atlassian/bitbucket-data/shared/data/repositories/68
  |   \_ git gc --auto --quiet
  |   \_ git reflog expire --all

plus another dozen gc/expire pairs where the parent is already gone.
All with the same arguments - auto GC.

I'd wager that each push sees that a GC is in order,
and doesn't notice that there is one already running.

- Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds 
Date: Fri, 22 Jan 2010 07:29:21 -0800


Re: Flurries of 'git reflog expire'

2017-07-06 Thread Andreas Krey
On Tue, 04 Jul 2017 11:43:33 +, Ævar Arnfjörð Bjarmason wrote:
...
> You can set gc.auto=0 in the repo to disable auto-gc, and play with
> e.g. the reflog expire values, see the git-gc manpage.
> 
> But then you need to run your own gc, which is not a bad idea anyway
> with a dedicated git server.

Actually, bitbucket should be doing this. Although I can't quite
rule out the possibility that we reenabled GC in this repo some
time ago.

> But it would be good to get to the bottom of this, we shouldn't be
> running these concurrently.

Indeed. Unfortunately this isn't easily reproduced in the test instance,
so I will need to get a newer git under the production bitbucket.

There are quite some of

  \_ /usr/bin/git receive-pack 
/opt/apps/atlassian/bitbucket-data/shared/data/repositories/68
  |   \_ git gc --auto --quiet
  |   \_ git reflog expire --all

in the process tree, apparently a new one gets started even though previous
ones are still running. The number of running expires grew slowly, in the
order of many minutes.

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds 
Date: Fri, 22 Jan 2010 07:29:21 -0800


Re: Flurries of 'git reflog expire'

2017-07-05 Thread Jeff King
On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote:

> Questions:
> 
> What can be done about this? Cronjob 'git reflog expire' at midnight,
> so the heuristic don't trigger during the day? (The relnotes don't
> mention anything after 2.4.0, so I suppose a git upgrade won't help.)
> 
> What is the actual cause? Bad heuristics in git itself, or does
> bitbucket run them too often (improbable)?

If it's using --expire-unreachable (which a default "git gc" does), that
means the we have to traverse the entire history to see what is
reachable and what is not. Added on to a normal git-gc, that's usually
not a big deal (it has to do that traversal and much more for the
repack). But if bitbucket is triggering it for other operations, that
could be related (I don't think anything but gc should ever run it
otherwise).

I seem to recall that using --stale-fix is also extremely expensive,
too. What do the command line arguments for the slow commands look like?
And what does the process tree look like?

-Peff


Re: Flurries of 'git reflog expire'

2017-07-04 Thread Ævar Arnfjörð Bjarmason

On Tue, Jul 04 2017, Andreas Krey jotted:

> Hi everyone,
>
> how is 'git reflog expire' triggered? We're occasionally seeing a lot
> of the running in parallel on a single of our repos (atlassian bitbucket),
> and this usually means that the machine is not very responsive for
> twenty minutes, the repo being as big as it is.

Assuming Linux, what does 'ps auxf' look like when this happens? Is the
parent a 'git gc --auto'?

> The server is still on git 2.6.2 (and bitbucket 4.14.5).

You might want to upgrade, we've had a bunch of changes since then,
maybe some of this fixes it:

git log --reverse -p -L'/^static.*lock_repo_for/,/^}/:builtin/gc.c'

> Questions:
>
> What can be done about this? Cronjob 'git reflog expire' at midnight,
> so the heuristic don't trigger during the day? (The relnotes don't
> mention anything after 2.4.0, so I suppose a git upgrade won't help.)
>
> What is the actual cause? Bad heuristics in git itself, or does
> bitbucket run them too often (improbable)?

You can set gc.auto=0 in the repo to disable auto-gc, and play with
e.g. the reflog expire values, see the git-gc manpage.

But then you need to run your own gc, which is not a bad idea anyway
with a dedicated git server.

But it would be good to get to the bottom of this, we shouldn't be
running these concurrently.


Flurries of 'git reflog expire'

2017-07-04 Thread Andreas Krey
Hi everyone,

how is 'git reflog expire' triggered? We're occasionally seeing a lot
of the running in parallel on a single of our repos (atlassian bitbucket),
and this usually means that the machine is not very responsive for
twenty minutes, the repo being as big as it is.

The server is still on git 2.6.2 (and bitbucket 4.14.5).

Questions:

What can be done about this? Cronjob 'git reflog expire' at midnight,
so the heuristic don't trigger during the day? (The relnotes don't
mention anything after 2.4.0, so I suppose a git upgrade won't help.)

What is the actual cause? Bad heuristics in git itself, or does
bitbucket run them too often (improbable)?

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds 
Date: Fri, 22 Jan 2010 07:29:21 -0800