Re: [PATCH] repack: respect gc.pid lock

2017-04-20 Thread Jeff King
On Thu, Apr 20, 2017 at 08:10:24PM +, David Turner wrote:

> > Is "-a" or "-A" the key factor? Are there current callers who prefer the 
> > current
> > behavior of "possibly duplicate some work, but never report failure" versus 
> > "do
> > not duplicate work, but sometimes fail due to lock contention"?
> 
> One problem with failing is that it can leave a temp pack behind.

Yeah. IMHO we should probably treat failed object and pack writes as
normal tempfiles and remove them (but possibly respect a "debug mode"
that leaves them around). But that's another patch entirely.

> I think the correct fix is to change the default code.packedGitLimit on 
> 64-bit 
> machines to 32 terabytes (2**45 bytes).  That's because on modern Intel 
> processors, there are 48 bits of address space actually available, but the 
> kernel 
> is going to probably reserve a few bits.  My machine claims to have 2**46 
> bytes 
> of virtual address space available.  It's also several times bigger than any 
> repo that I know of or can easily imagine.
> 
> Does that seem reasonable to you?

Yes, it does.

-Peff


RE: [PATCH] repack: respect gc.pid lock

2017-04-20 Thread David Turner
> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Tuesday, April 18, 2017 1:50 PM
> To: David Turner <david.tur...@twosigma.com>
> Cc: git@vger.kernel.org; christian.cou...@gmail.com; mf...@codeaurora.org;
> jacob.kel...@gmail.com
> Subject: Re: [PATCH] repack: respect gc.pid lock
> 
> On Tue, Apr 18, 2017 at 05:43:29PM +, David Turner wrote:
> 
> > > A lock can catch the racy cases where both run at the same time. But
> > > I think that
> > > even:
> > >
> > >   git -c repack.writeBitmaps=true repack -Ad
> > >   [...wait...]
> > >   git gc
> > >
> > > is questionable, because that gc will erase your bitmaps. How does
> > > git-gc know that it's doing a bad thing by repacking without
> > > bitmaps, and that you didn't simply change your configuration or want to 
> > > get
> rid of them?
> >
> > Sorry, the gc in Gitlab does keep bitmaps.  The one I quoted in a
> > previous message  doesn't, because the person typing the command was
> > just doing some manual  testing and I guess didn't realize that
> > bitmaps were important.  Or perhaps he knew that repack.writeBitmaps was
> already set in the config.
> 
> Sure, but I guess I'd just wonder what _else_ is different between the 
> commands
> (and if nothing, why are both running).

Presumably, repack is faster, and they're not intended to run concurrently (but 
there's a Gitlab bug causing them to do so).  But you'll have to ask the Gitlab 
folks for more details.

> > So given that the lock will catch the races, might it be a good idea
> > (if Implemented to avoid locking on repack -d)?
> 
> I'm mildly negative just because it increases complexity, and I don't think 
> it's
> actually buying very much. It's not clear to me which invocations of repack
> would want to lock and which ones wouldn't.
> 
> Is "-a" or "-A" the key factor? Are there current callers who prefer the 
> current
> behavior of "possibly duplicate some work, but never report failure" versus 
> "do
> not duplicate work, but sometimes fail due to lock contention"?

One problem with failing is that it can leave a temp pack behind.

I think the correct fix is to change the default code.packedGitLimit on 64-bit 
machines to 32 terabytes (2**45 bytes).  That's because on modern Intel 
processors, there are 48 bits of address space actually available, but the 
kernel 
is going to probably reserve a few bits.  My machine claims to have 2**46 bytes 
of virtual address space available.  It's also several times bigger than any 
repo that I know of or can easily imagine.

Does that seem reasonable to you?


Re: [PATCH] repack: respect gc.pid lock

2017-04-18 Thread Jeff King
On Tue, Apr 18, 2017 at 05:43:29PM +, David Turner wrote:

> > A lock can catch the racy cases where both run at the same time. But I 
> > think that
> > even:
> > 
> >   git -c repack.writeBitmaps=true repack -Ad
> >   [...wait...]
> >   git gc
> > 
> > is questionable, because that gc will erase your bitmaps. How does git-gc 
> > know
> > that it's doing a bad thing by repacking without bitmaps, and that you 
> > didn't
> > simply change your configuration or want to get rid of them?
> 
> Sorry, the gc in Gitlab does keep bitmaps.  The one I quoted in a previous 
> message  doesn't, because the person typing the command was just doing some 
> manual  testing and I guess didn't realize that bitmaps were important.  Or 
> perhaps he knew that repack.writeBitmaps was already set in the config.

Sure, but I guess I'd just wonder what _else_ is different between the
commands (and if nothing, why are both running).

> So given that the lock will catch the races, might it be a good idea (if 
> Implemented to avoid locking on repack -d)?

I'm mildly negative just because it increases complexity, and I don't
think it's actually buying very much. It's not clear to me which
invocations of repack would want to lock and which ones wouldn't.

Is "-a" or "-A" the key factor? Are there current callers who prefer the
current behavior of "possibly duplicate some work, but never report
failure" versus "do not duplicate work, but sometimes fail due to lock
contention"?

-Peff


RE: [PATCH] repack: respect gc.pid lock

2017-04-18 Thread David Turner


> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Tuesday, April 18, 2017 1:20 PM
> To: David Turner <david.tur...@twosigma.com>
> Cc: git@vger.kernel.org; christian.cou...@gmail.com; mf...@codeaurora.org;
> jacob.kel...@gmail.com
> Subject: Re: [PATCH] repack: respect gc.pid lock
> 
> On Tue, Apr 18, 2017 at 05:16:52PM +, David Turner wrote:
> 
> > > -Original Message-
> > > From: Jeff King [mailto:p...@peff.net]
> > > Sent: Monday, April 17, 2017 11:42 PM
> > > To: David Turner <david.tur...@twosigma.com>
> > > Cc: git@vger.kernel.org; christian.cou...@gmail.com;
> > > mf...@codeaurora.org; jacob.kel...@gmail.com
> > > Subject: Re: [PATCH] repack: respect gc.pid lock
> > >
> > > On Mon, Apr 17, 2017 at 11:29:18PM +, David Turner wrote:
> > >
> > > > We saw this failure in the logs multiple  times (with three
> > > > different shas, while a gc was running):
> > > > April 12, 2017 06:45 -> ERROR -> 'git -c repack.writeBitmaps=true
> > > > repack -A -d
> > > --pack-kept-objects' in [repo] failed:
> > > > fatal: packfile ./objects/pack/pack-[sha].pack cannot be accessed
> > > > Possibly some other repack was also running at the time as well.
> > > >
> > > > My colleague also saw it while manually doing gc (again while
> > > > repacks were likely to be running):
> > >
> > > This is sort of a side question, but...why are you running other
> > > repacks alongside git-gc? It seems like you ought to be doing one or the
> other.
> >
> > But actually, it would be kind of nice if git would help protect us from 
> > doing
> this?
> 
> A lock can catch the racy cases where both run at the same time. But I think 
> that
> even:
> 
>   git -c repack.writeBitmaps=true repack -Ad
>   [...wait...]
>   git gc
> 
> is questionable, because that gc will erase your bitmaps. How does git-gc know
> that it's doing a bad thing by repacking without bitmaps, and that you didn't
> simply change your configuration or want to get rid of them?

Sorry, the gc in Gitlab does keep bitmaps.  The one I quoted in a previous 
message  doesn't, because the person typing the command was just doing some 
manual  testing and I guess didn't realize that bitmaps were important.  Or 
perhaps he knew that repack.writeBitmaps was already set in the config.

So given that the lock will catch the races, might it be a good idea (if 
Implemented to avoid locking on repack -d)?


Re: [PATCH] repack: respect gc.pid lock

2017-04-18 Thread Jeff King
On Tue, Apr 18, 2017 at 05:16:52PM +, David Turner wrote:

> > -Original Message-
> > From: Jeff King [mailto:p...@peff.net]
> > Sent: Monday, April 17, 2017 11:42 PM
> > To: David Turner <david.tur...@twosigma.com>
> > Cc: git@vger.kernel.org; christian.cou...@gmail.com; mf...@codeaurora.org;
> > jacob.kel...@gmail.com
> > Subject: Re: [PATCH] repack: respect gc.pid lock
> > 
> > On Mon, Apr 17, 2017 at 11:29:18PM +, David Turner wrote:
> > 
> > > We saw this failure in the logs multiple  times (with three different
> > > shas, while a gc was running):
> > > April 12, 2017 06:45 -> ERROR -> 'git -c repack.writeBitmaps=true repack 
> > > -A -d
> > --pack-kept-objects' in [repo] failed:
> > > fatal: packfile ./objects/pack/pack-[sha].pack cannot be accessed
> > > Possibly some other repack was also running at the time as well.
> > >
> > > My colleague also saw it while manually doing gc (again while repacks
> > > were likely to be running):
> > 
> > This is sort of a side question, but...why are you running other repacks 
> > alongside
> > git-gc? It seems like you ought to be doing one or the other.
> 
> But actually, it would be kind of nice if git would help protect us from 
> doing this?

A lock can catch the racy cases where both run at the same time. But I
think that even:

  git -c repack.writeBitmaps=true repack -Ad
  [...wait...]
  git gc

is questionable, because that gc will erase your bitmaps. How does
git-gc know that it's doing a bad thing by repacking without bitmaps,
and that you didn't simply change your configuration or want to get rid
of them?

-Peff


Re: [PATCH] repack: respect gc.pid lock

2017-04-18 Thread Jeff King
On Tue, Apr 18, 2017 at 05:08:14PM +, David Turner wrote:

> On 64-bit systems, I think core.packedGitLimit doesn't make a 
> lot of sense. There is plenty of address space.  Why not use it?

That's my gut feeling, too. I'd have a slight worry that the OS's paging
behavior may respond differently if we have more memory mapped. But
that's not based on numbers, just a fear of the unknown. :)

If we have infinite windows anyway, I suspect we could just mmap entire
packfiles and forget about all the window complexity in the first place.
Although IIRC some operating systems take a long time to set up large
mmaps, and we may only need a small part of a large pack.

> I'll ask our git server administrator to adjust core.packedGitLimit
> and turn repacks back on to see if that fixes the issue.

Thanks. Let us know if you get any results, either way.

-Peff


RE: [PATCH] repack: respect gc.pid lock

2017-04-18 Thread David Turner
> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Monday, April 17, 2017 11:42 PM
> To: David Turner <david.tur...@twosigma.com>
> Cc: git@vger.kernel.org; christian.cou...@gmail.com; mf...@codeaurora.org;
> jacob.kel...@gmail.com
> Subject: Re: [PATCH] repack: respect gc.pid lock
> 
> On Mon, Apr 17, 2017 at 11:29:18PM +, David Turner wrote:
> 
> > We saw this failure in the logs multiple  times (with three different
> > shas, while a gc was running):
> > April 12, 2017 06:45 -> ERROR -> 'git -c repack.writeBitmaps=true repack -A 
> > -d
> --pack-kept-objects' in [repo] failed:
> > fatal: packfile ./objects/pack/pack-[sha].pack cannot be accessed
> > Possibly some other repack was also running at the time as well.
> >
> > My colleague also saw it while manually doing gc (again while repacks
> > were likely to be running):
> 
> This is sort of a side question, but...why are you running other repacks 
> alongside
> git-gc? It seems like you ought to be doing one or the other.

But actually, it would be kind of nice if git would help protect us from doing 
this?


RE: [PATCH] repack: respect gc.pid lock

2017-04-18 Thread David Turner
> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Monday, April 17, 2017 11:42 PM
> To: David Turner <david.tur...@twosigma.com>
> Cc: git@vger.kernel.org; christian.cou...@gmail.com; mf...@codeaurora.org;
> jacob.kel...@gmail.com
> Subject: Re: [PATCH] repack: respect gc.pid lock
> 
> On Mon, Apr 17, 2017 at 11:29:18PM +, David Turner wrote:
> 
> > We saw this failure in the logs multiple  times (with three different
> > shas, while a gc was running):
> > April 12, 2017 06:45 -> ERROR -> 'git -c repack.writeBitmaps=true repack -A 
> > -d
> --pack-kept-objects' in [repo] failed:
> > fatal: packfile ./objects/pack/pack-[sha].pack cannot be accessed
> > Possibly some other repack was also running at the time as well.
> >
> > My colleague also saw it while manually doing gc (again while repacks
> > were likely to be running):
> 
> This is sort of a side question, but...why are you running other repacks 
> alongside
> git-gc? It seems like you ought to be doing one or the other.
>
> I don't begrudge anybody with a complicated setup running their own set of gc
> commands, but I'd think you would want to do locking there, and disable auto-
> gc entirely. Otherwise you're going to get different results depending on who
> gc'd last.

That's what gitlab does, so you'll have to ask them why they do it that way.  
From https://gitlab.com/gitlab-org/gitlab-ce/issues/30939#note_27487981
 it looks like they may have intended to have a lock but not quite succeeded.
 
> > $ git gc --aggressive
> > Counting objects: 13800073, done.
> > Delta compression using up to 8 threads.
> > Compressing objects:  99% (11465846/11465971)
> > Compressing objects: 100% (11465971/11465971), done.
> > fatal: packfile [repo]/objects/pack/pack-[sha].pack cannot be accessed
> 
> OK, so this presumably happened during the writing phase. Which seems like the
> "a pack was closed, and we couldn't re-open it" problem we've seen before.
> 
> > We have a reasonable rlimit (64k soft limit), so that failure mode is
> > pretty unlikely.  I  think we should have had 20 or so packs -- not tens of
> thousands.
> > [...]
> > Do you have any idea why this would be happening other than the rlimit 
> > thing?
> 
> Yeah, that should be enough (you could double check the return of
> get_max_fd_limit() on your system if you wanted to be paranoid).
> 
> We also keep only a limited number of bytes mmap'd at one time. Normally we
> don't actually close packfiles when we release their mmap windows.
> But I think there is one path that might. When use_pack() maps a pack, if the
> entire pack fits in a single window, then we close it; this is due to 
> d131b7afe
> (sha1_file.c: Don't retain open fds on small packs, 2011-03-02).
> 
> But if we ever unmap that window, now we have no handle to the pack.
> Normally on a 64-bit system this wouldn't happen at all, since the default
> core.packedGitLimit is 8GB there.

Aha, I missed that limit while messing around with the code.  That must be it.

> So if you have a few small packs and one very large pack (over 8GB), I think 
> this
> could trigger. We may do the small-pack thing for some of them, and then the
> large pack forces us to drop the mmaps for some of the others. When we go
> back to access the small pack, we find it's gone.
> 
> One solution would be to bump core.packedGitLimit to something much higher
> (it's an mmap, so we're really just chewing up address space; it's up to the 
> OS to
> decide when to load pages from disk and when to drop them).
>
> The other alternative is to disable the small-pack closing from d131b7afe. It
> might need to be configurable, or perhaps auto-tuned based on the fd limit.
> Linux systems tend to have generous descriptor limits, but I'm not sure we can
> rely on that. OTOH, it seems like the code to close descriptors when needed
> would take care of things. So maybe we should just revert d131b7afe entirely.

I definitely remember running into fd limits when processing very large numbers 
of packs at Twitter, but I don't recall the exact details.  Presumably, 
d131b7afe
was supposed to help with this, but in fact, it did not totally solve it. 
Perhaps 
we were doing something funny.  Adjusting the fd limits was the easy fix.

On 64-bit systems, I think core.packedGitLimit doesn't make a 
lot of sense. There is plenty of address space.  Why not use it?

For 32-bit systems, of course, address space is more precious.

I'll ask our git server administrator to adjust core.packedGitLimit
and turn repacks back on to see if that fixes the issue.

> The final thing I'd ask is whether you might be on a networked filesystem that
> would f

Re: [PATCH] repack: respect gc.pid lock

2017-04-17 Thread Jeff King
On Mon, Apr 17, 2017 at 11:29:18PM +, David Turner wrote:

> We saw this failure in the logs multiple  times (with three different
> shas, while a gc was running):
> April 12, 2017 06:45 -> ERROR -> 'git -c repack.writeBitmaps=true repack -A 
> -d --pack-kept-objects' in [repo] failed:
> fatal: packfile ./objects/pack/pack-[sha].pack cannot be accessed
> Possibly some other repack was also running at the time as well.
> 
> My colleague also saw it while manually doing gc (again while 
> repacks were likely to be running):

This is sort of a side question, but...why are you running other repacks
alongside git-gc? It seems like you ought to be doing one or the other.

I don't begrudge anybody with a complicated setup running their own set
of gc commands, but I'd think you would want to do locking there, and
disable auto-gc entirely. Otherwise you're going to get different
results depending on who gc'd last.

> $ git gc --aggressive
> Counting objects: 13800073, done.
> Delta compression using up to 8 threads.
> Compressing objects:  99% (11465846/11465971)   
> Compressing objects: 100% (11465971/11465971), done.
> fatal: packfile [repo]/objects/pack/pack-[sha].pack cannot be accessed

OK, so this presumably happened during the writing phase. Which seems
like the "a pack was closed, and we couldn't re-open it" problem we've
seen before.

> We have a reasonable rlimit (64k soft limit), so that failure mode is pretty 
> unlikely.  I  think we should have had 20 or so packs -- not tens of 
> thousands.
> [...]
> Do you have any idea why this would be happening other than the rlimit thing?

Yeah, that should be enough (you could double check the return of
get_max_fd_limit() on your system if you wanted to be paranoid).

We also keep only a limited number of bytes mmap'd at one time. Normally
we don't actually close packfiles when we release their mmap windows.
But I think there is one path that might. When use_pack() maps a pack,
if the entire pack fits in a single window, then we close it; this is
due to d131b7afe (sha1_file.c: Don't retain open fds on small packs,
2011-03-02).

But if we ever unmap that window, now we have no handle to the pack.
Normally on a 64-bit system this wouldn't happen at all, since the
default core.packedGitLimit is 8GB there.

So if you have a few small packs and one very large pack (over 8GB), I
think this could trigger. We may do the small-pack thing for some of
them, and then the large pack forces us to drop the mmaps for some of
the others. When we go back to access the small pack, we find it's gone.

One solution would be to bump core.packedGitLimit to something much
higher (it's an mmap, so we're really just chewing up address space;
it's up to the OS to decide when to load pages from disk and when to
drop them).

The other alternative is to disable the small-pack closing from
d131b7afe. It might need to be configurable, or perhaps auto-tuned based
on the fd limit. Linux systems tend to have generous descriptor limits,
but I'm not sure we can rely on that. OTOH, it seems like the code to
close descriptors when needed would take care of things. So maybe we
should just revert d131b7afe entirely.

The final thing I'd ask is whether you might be on a networked
filesystem that would foil our usual "open descriptors mean packs don't
go away" logic. But after having dug into the details above, I have a
feeling the answer is simply that you have repositories >8GB.

And if that is the case, then yeah, your locking patch is definitely a
band-aid. If you fetch and repack at the same time, you'll eventually
see a racy failed fetch.

-Peff


RE: [PATCH] repack: respect gc.pid lock

2017-04-17 Thread David Turner

> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Friday, April 14, 2017 3:34 PM
> To: David Turner <david.tur...@twosigma.com>
> Cc: git@vger.kernel.org; christian.cou...@gmail.com; mf...@codeaurora.org;
> jacob.kel...@gmail.com
> Subject: Re: [PATCH] repack: respect gc.pid lock
> 
> On Thu, Apr 13, 2017 at 04:27:12PM -0400, David Turner wrote:
> 
> > Git gc locks the repository (using a gc.pid file) so that other gcs
> > don't run concurrently. Make git repack respect this lock.
> >
> > Now repack, by default, will refuse to run at the same time as a gc.
> > This fixes a concurrency issue: a repack which deleted packs would
> > make a concurrent gc sad when its packs were deleted out from under
> > it.  The gc would fail with: "fatal: ./objects/pack/pack-$sha.pack
> > cannot be accessed".  Then it would die, probably leaving a large temp
> > pack hanging around.
> >
> > Git repack learns --no-lock, so that when run under git gc, it doesn't
> > attempt to manage the lock itself.
> 
> This also means that two repack invocations cannot run simultaneously, because
> they want to take the same lock.  But depending on the options, the two don't
> necessarily conflict. For example, two simultaneous incremental "git repack 
> -d"
> invocations should be able to complete.
> 
> Do we know where the error message is coming from? I couldn't find the error
> message you've given above; grepping for "cannot be accessed"
> shows only error messages that would have "packfile" after the "fatal:".
> Is it a copy-paste error?

Yes, it is.  Sorry.

We saw this failure in the logs multiple  times (with three different
shas, while a gc was running):
April 12, 2017 06:45 -> ERROR -> 'git -c repack.writeBitmaps=true repack -A -d 
--pack-kept-objects' in [repo] failed:
fatal: packfile ./objects/pack/pack-[sha].pack cannot be accessed
Possibly some other repack was also running at the time as well.

My colleague also saw it while manually doing gc (again while 
repacks were likely to be running):
$ git gc --aggressive
Counting objects: 13800073, done.
Delta compression using up to 8 threads.
Compressing objects:  99% (11465846/11465971)   
Compressing objects: 100% (11465971/11465971), done.
fatal: packfile [repo]/objects/pack/pack-[sha].pack cannot be accessed

(yes, I know that --aggressive is usually not desirable)

> If that's the case, then it's the one in use_pack(). Do we know what
> program/operation is causing the error? Having a simultaneous gc delete a
> packfile is _supposed_ to work, through a combination of:
> 
>   1. Most sha1-access operations can re-scan the pack directory if they
>  find the packfile went away.
> 
>   2. The pack-objects run by a simultaneous repack is somewhat special
>  in that once it finds and commits to a copy of an object in a pack,
>  we need to use exactly that pack, because we record its offset,
>  delta representation, etc. Usually this works because we open and
>  mmap the packfile before making that commitment, and open packfiles
>  are only closed if you run out of file descriptors (which should
>  only happen when you have a huge number of packs).

We have a reasonable rlimit (64k soft limit), so that failure mode is pretty 
unlikely.  I  think we should have had 20 or so packs -- not tens of thousands.

> So I'm worried that this repack lock is going to regress some other cases 
> that run
> fine together. But I'm also worried that it's a band-aid over a more subtle
> problem. If pack-objects is not able to run alongside a gc, then you're also 
> going
> to have problems serving fetches, and obviously you wouldn't want to take a
> lock there.

I see your point.  I don't know if it's pack-objects that's seeing this, 
although maybe 
that's the only reasonable codepath.

I did some tracing through the code, and couldn't figure out how to trigger 
that 
error message.  It appears in two places in the code, but only when the pack is 
not 
initialized.  But the packs always seem to be set up by that point in my test 
runs.  
It's worth noting that I'm not testing on the gitlab server; I'm testing on my 
laptop with
a completely different repo.  But I've tried various ways to repro this -- or 
even to 
get to a point where those errors would have been thrown given a missing pack 
-- 
and I have not been able to.

Do you have any idea why this would be happening other than the rlimit thing?



Re: [PATCH] repack: respect gc.pid lock

2017-04-14 Thread Jeff King
On Thu, Apr 13, 2017 at 04:27:12PM -0400, David Turner wrote:

> Git gc locks the repository (using a gc.pid file) so that other gcs
> don't run concurrently. Make git repack respect this lock.
> 
> Now repack, by default, will refuse to run at the same time as a gc.
> This fixes a concurrency issue: a repack which deleted packs would
> make a concurrent gc sad when its packs were deleted out from under
> it.  The gc would fail with: "fatal: ./objects/pack/pack-$sha.pack
> cannot be accessed".  Then it would die, probably leaving a large temp
> pack hanging around.
> 
> Git repack learns --no-lock, so that when run under git gc, it doesn't
> attempt to manage the lock itself.

This also means that two repack invocations cannot run simultaneously,
because they want to take the same lock.  But depending on the options,
the two don't necessarily conflict. For example, two simultaneous
incremental "git repack -d" invocations should be able to complete.

Do we know where the error message is coming from? I couldn't find the
error message you've given above; grepping for "cannot be accessed"
shows only error messages that would have "packfile" after the "fatal:".
Is it a copy-paste error?

If that's the case, then it's the one in use_pack(). Do we know what
program/operation is causing the error? Having a simultaneous gc delete
a packfile is _supposed_ to work, through a combination of:

  1. Most sha1-access operations can re-scan the pack directory if they
 find the packfile went away.

  2. The pack-objects run by a simultaneous repack is somewhat special
 in that once it finds and commits to a copy of an object in a pack,
 we need to use exactly that pack, because we record its offset,
 delta representation, etc. Usually this works because we open and
 mmap the packfile before making that commitment, and open packfiles
 are only closed if you run out of file descriptors (which should
 only happen when you have a huge number of packs).

So I'm worried that this repack lock is going to regress some other
cases that run fine together. But I'm also worried that it's a band-aid
over a more subtle problem. If pack-objects is not able to run alongside
a gc, then you're also going to have problems serving fetches, and
obviously you wouldn't want to take a lock there.

-Peff


Re: [PATCH] repack: respect gc.pid lock

2017-04-13 Thread Jacob Keller
On Thu, Apr 13, 2017 at 1:27 PM, David Turner  wrote:
> Git gc locks the repository (using a gc.pid file) so that other gcs
> don't run concurrently. Make git repack respect this lock.
>
> Now repack, by default, will refuse to run at the same time as a gc.
> This fixes a concurrency issue: a repack which deleted packs would
> make a concurrent gc sad when its packs were deleted out from under
> it.  The gc would fail with: "fatal: ./objects/pack/pack-$sha.pack
> cannot be accessed".  Then it would die, probably leaving a large temp
> pack hanging around.
>
> Git repack learns --no-lock, so that when run under git gc, it doesn't
> attempt to manage the lock itself.
>
> Martin Fick suggested just moving the lock into git repack, but this
> would leave parts of git gc (e.g. git prune) protected by only local
> locks.  I worried that a prune (part of git gc) concurrent with a
> repack could confuse the repack, so I decided to go with this
> solution.
>

The last paragraph could be reworded to be a bit less personal and
more as a direct statement of why moving the lock entirely to repack
is a bad idea.

> Signed-off-by: David Turner 
> ---
>  Documentation/git-repack.txt |  5 +++
>  Makefile |  1 +
>  builtin/gc.c | 72 ++--
>  builtin/repack.c | 13 +++
>  repack.c | 88 
> 
>  repack.h |  8 
>  t/t7700-repack.sh|  8 
>  7 files changed, 127 insertions(+), 68 deletions(-)
>  create mode 100644 repack.c
>  create mode 100644 repack.h
>
> diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
> index 26afe6ed54..b347ff5c62 100644
> --- a/Documentation/git-repack.txt
> +++ b/Documentation/git-repack.txt
> @@ -143,6 +143,11 @@ other objects in that pack they already have locally.
> being removed. In addition, any unreachable loose objects will
> be packed (and their loose counterparts removed).
>
> +--no-lock::
> +   Do not lock the repository, and do not respect any existing lock.
> +   Mostly useful for running repack within git gc.  Do not use this
> +   unless you know what you are doing.
> +

I would have phrased this more like:

  Used internally by git gc to call git repack while already holding
the lock. Do not use unless you know what you're doing.


[PATCH] repack: respect gc.pid lock

2017-04-13 Thread David Turner
Git gc locks the repository (using a gc.pid file) so that other gcs
don't run concurrently. Make git repack respect this lock.

Now repack, by default, will refuse to run at the same time as a gc.
This fixes a concurrency issue: a repack which deleted packs would
make a concurrent gc sad when its packs were deleted out from under
it.  The gc would fail with: "fatal: ./objects/pack/pack-$sha.pack
cannot be accessed".  Then it would die, probably leaving a large temp
pack hanging around.

Git repack learns --no-lock, so that when run under git gc, it doesn't
attempt to manage the lock itself.

Martin Fick suggested just moving the lock into git repack, but this
would leave parts of git gc (e.g. git prune) protected by only local
locks.  I worried that a prune (part of git gc) concurrent with a
repack could confuse the repack, so I decided to go with this
solution.

Signed-off-by: David Turner 
---
 Documentation/git-repack.txt |  5 +++
 Makefile |  1 +
 builtin/gc.c | 72 ++--
 builtin/repack.c | 13 +++
 repack.c | 88 
 repack.h |  8 
 t/t7700-repack.sh|  8 
 7 files changed, 127 insertions(+), 68 deletions(-)
 create mode 100644 repack.c
 create mode 100644 repack.h

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 26afe6ed54..b347ff5c62 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -143,6 +143,11 @@ other objects in that pack they already have locally.
being removed. In addition, any unreachable loose objects will
be packed (and their loose counterparts removed).
 
+--no-lock::
+   Do not lock the repository, and do not respect any existing lock.
+   Mostly useful for running repack within git gc.  Do not use this
+   unless you know what you are doing.
+
 Configuration
 -
 
diff --git a/Makefile b/Makefile
index 9b36068ac5..7095f03959 100644
--- a/Makefile
+++ b/Makefile
@@ -816,6 +816,7 @@ LIB_OBJS += refs/files-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += remote.o
+LIB_OBJS += repack.o
 LIB_OBJS += replace_object.o
 LIB_OBJS += rerere.o
 LIB_OBJS += resolve-undo.o
diff --git a/builtin/gc.c b/builtin/gc.c
index c2c61a57bb..9b9c27020b 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -18,6 +18,7 @@
 #include "sigchain.h"
 #include "argv-array.h"
 #include "commit.h"
+#include "repack.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -45,7 +46,6 @@ static struct argv_array prune = ARGV_ARRAY_INIT;
 static struct argv_array prune_worktrees = ARGV_ARRAY_INIT;
 static struct argv_array rerere = ARGV_ARRAY_INIT;
 
-static struct tempfile pidfile;
 static struct lock_file log_lock;
 
 static struct string_list pack_garbage = STRING_LIST_INIT_DUP;
@@ -234,70 +234,6 @@ static int need_to_gc(void)
return 1;
 }
 
-/* return NULL on success, else hostname running the gc */
-static const char *lock_repo_for_gc(int force, pid_t* ret_pid)
-{
-   static struct lock_file lock;
-   char my_host[128];
-   struct strbuf sb = STRBUF_INIT;
-   struct stat st;
-   uintmax_t pid;
-   FILE *fp;
-   int fd;
-   char *pidfile_path;
-
-   if (is_tempfile_active())
-   /* already locked */
-   return NULL;
-
-   if (gethostname(my_host, sizeof(my_host)))
-   xsnprintf(my_host, sizeof(my_host), "unknown");
-
-   pidfile_path = git_pathdup("gc.pid");
-   fd = hold_lock_file_for_update(, pidfile_path,
-  LOCK_DIE_ON_ERROR);
-   if (!force) {
-   static char locking_host[128];
-   int should_exit;
-   fp = fopen(pidfile_path, "r");
-   memset(locking_host, 0, sizeof(locking_host));
-   should_exit =
-   fp != NULL &&
-   !fstat(fileno(fp), ) &&
-   /*
-* 12 hour limit is very generous as gc should
-* never take that long. On the other hand we
-* don't really need a strict limit here,
-* running gc --auto one day late is not a big
-* problem. --force can be used in manual gc
-* after the user verifies that no gc is
-* running.
-*/
-   time(NULL) - st.st_mtime <= 12 * 3600 &&
-   fscanf(fp, "%"SCNuMAX" %127c", , locking_host) == 2 
&&
-   /* be gentle to concurrent "gc" on remote hosts */
-   (strcmp(locking_host, my_host) || !kill(pid, 0) || 
errno == EPERM);
-   if (fp != NULL)
-   fclose(fp);
-   if (should_exit) {
-   if