Re: Missing Refs after Garbage Collection

2012-12-22 Thread Dmitry Potapov
Hi,

On Sat, Dec 22, 2012 at 5:41 AM, Earl Gresh egr...@codeaurora.org wrote:

 Is anyone familiar with git gc deleting valid references? I'm running
 git version 1.7.8. Have there been any patches in later git releases
 that might address this issue ( if it is a git problem )?

I have not seen any relevant changes in git. I have looked at the code,
and what git-gc is running git pack-refs --all --prune, which is very
careful in packing and fsyncing the new file with all packed references
before deleting anything. Only those references that were packed can be
deleted. Also, it does not matter whether a reference is valid or not,
or whether it is stored in refs/changes or in some other place, like
refs/heads. So if references were really lost as you described, I think
other people would notice that by now.

The only plausible explanation that comes to my mind now is that file
creation using O_EXCL is not atomic on your system, then the lock did
not work and one process could overwrite packed references created by
another.


Dmitry
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Missing Refs after Garbage Collection

2012-12-22 Thread Jeff King
On Fri, Dec 21, 2012 at 05:41:43PM -0800, Earl Gresh wrote:

 I have observed that after running GC, one particular git repository
 ended up with some missing refs in the refs/changes/* namespace the
 Gerrit uses for storing patch sets. The refs were valid and should not
 have been pruned. Concerned about loosing data, GC is still enabled
 but ref packing is turned off. Now the number of refs has grown to the
 point that it's causing performance problems when cloning the project.
 
 Is anyone familiar with git gc deleting valid references? I'm running
 git version 1.7.8. Have there been any patches in later git releases
 that might address this issue ( if it is a git problem )?

I have never seen deletion, but I did recently find a race condition
with ref packing that caused rewinds, where:

  1. Two processes simultaneously repack the refs.

  2. At least one process is using an old version of the pack-refs
 file. That is, it cached the packed refs list earlier in the
 process and is now rewriting it based on that cached notion.

  3. The first process takes the lock, packs refs, drops the
 lock, and then deletes the loose versions. The simultaneous packer
 then takes the lock, overwrites the packed-refs file with a stale
 copy from its memory, and then releases the lock. We're left with
 the stale copy in pack-refs, and deleted loose refs.

In my case, it looked like a rewind, because the stale, memory-cached
refs had the old version. But if you have a ref which was not previously
packed, it would appear to have been deleted.

The tricky thing about triggering this race is that step (2) needs a
process which has previously read and cached the packed-refs, and then
decided to pack the refs. The git pack-refs command does not do this,
because it starts, packs the ref, and exists. But processes which delete
a ref need to rewrite the packed-refs file (omitting the deleted ref),
and depending on the process, may have previously read and cached the
packed refs file. The obvious candidate is receive-pack.

So this may be your culprit if:

  1. This is a repo people are pushing into via C git.

  2. You simultaneously run git pack-refs (or git gc) while people
 may be pushing.

You mentioned Gerrit, so I wonder if people are actually pushing via C
git (I thought it used JGit entirely). Or perhaps JGit has the same bug.
My fix (which is not yet released in any git version) is here:

  http://article.gmane.org/gmane.comp.version-control.git/211956

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html