Re: Git gc removes all packs

2015-02-27 Thread Jeff King
On Fri, Feb 27, 2015 at 11:16:09AM +0100, Dmitry Neverov wrote:

 I followed your advice and removed a symlink ref from my repository.
 But didn't help.. automatic GC has just removed all packs again. May
 alternates cause such a behavior? Are any ways to make gc log
 somewhere why it removes packs?

If you have two repositories, A and B, and A points to B via alternates,
then you cannot safely run git gc in B unless it knows about all of
the refs in A. As we discussed before, symlinking the refs is not
enough, because those symlinks get stale. But nor is removing the
symlinks and just not knowing about the refs. :)

The only safe thing to do is to fetch all of the refs from A into B just
before running the gc (and consequently, you probably want to disable
gc.auto in B).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git gc removes all packs

2015-02-27 Thread Dmitry Neverov
I followed your advice and removed a symlink ref from my repository.
But didn't help.. automatic GC has just removed all packs again. May
alternates cause such a behavior? Are any ways to make gc log
somewhere why it removes packs?

On Thu, Feb 5, 2015 at 9:03 PM, Jeff King p...@peff.net wrote:
 On Thu, Feb 05, 2015 at 04:13:03PM +0100, Dmitry Neverov wrote:

 I'm using git p4 for synchronization with perforce. Sometimes after 'git
 p4 rebase' git starts a garbage collection. When gc finishes a local
 repository contains no pack files only loose objects, so I have to
 re-import repository from perforce. It also doesn't contain a temporary
 pack git gc was creating.

 It sounds like git didn't find any refs; it will pack only objects which
 are reachable. Unreachable objects are either:

   1. Exploded into loose objects if the mtime on the pack they contain
  is less than 2 weeks old (and will eventually expire when they
  become 2 weeks old).

   2. Dropped completely if older than 2 weeks.

 One more thing about my setup: since git p4 promotes a use of a linear
 history I use a separate repository for another branch in perforce. In
 order to be able to cherry-pick between repositories I added this
 another repo objects dir as an alternate and also added a ref which is a
 symbolic link to a branch in another repo (so I don't have to do any
 fetches).

 You can't symlink refs like this. The loose refs in the filesystem may
 be migrated into the packed-refs file, at which point your symlink
 will be broken. That is a likely reason why git would not find any refs.

 So your setup will not ever work reliably.  But IMHO, it is a bug that
 git does not notice the broken symlink and abort an operation which is
 computing reachability in order to drop objects. As you noticed, it
 means a misconfiguration or filesystem error results in data loss.

 -Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git gc removes all packs

2015-02-17 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 On 02/17/2015 10:57 PM, Junio C Hamano wrote:
 ...
 Do you mean that we would end up reading refs/heads/hold if the user
 did this:
 
 git rev-parse --verify HEAD -- precious
 ln -s ../../../precious .git/refs/heads/hold
 
 because that symbolic link does not begin with refs/,

 Correct, you can do exactly that. The hold reference is resolvable and
 listable using for-each-ref. But if I try to update it, the contents
 of the precious file are overwritten. On the other hand, if I run
 pack-refs, then the current value of the hold reference is moved to
 packed-refs and the symlink is removed. This behavior is not sane.

 and is an
 accident waiting to happen so we should forbid it in the longer
 term and warning when we see it would be the first step?

 Yes, I am proposing that approach, though if somebody can suggest a use
 case I'm willing to be convinced otherwise.

Thanks.  I agree the proposed tightening is probably harmless, but I
too would want to see if somebody comes up with a valid use case.  I
do not think of anything offhand.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git gc removes all packs

2015-02-17 Thread Jeff King
On Tue, Feb 17, 2015 at 05:39:27PM +0100, Michael Haggerty wrote:

  You can't symlink refs like this. The loose refs in the filesystem may
  be migrated into the packed-refs file, at which point your symlink
  will be broken. That is a likely reason why git would not find any refs.
  
  So your setup will not ever work reliably.  But IMHO, it is a bug that
  git does not notice the broken symlink and abort an operation which is
  computing reachability in order to drop objects. As you noticed, it
  means a misconfiguration or filesystem error results in data loss.
 
 There's a bunch of code in refs.c that is there explicitly for reading
 loose references that are symlinks. If the link contents literally start
 with refs/, then they are read and treated as a symbolic ref.
 Otherwise, the symlink is just followed.

Right, but we should be able to notice that:

  1. We found a symlink.

  2. We couldn't read it its ref value (because it's a broken link).

I think we _do_ notice that at the lowest level, and set REF_ISBROKEN.
But the problem is that the reachability code in prune and in
pack-objects (triggered by repack -ad) uses for_each_ref, and not
for_each_rawref. So they ignore broken refs rather than complaining,
even though failing to read a ref may mean we could drop objects which
were only mentioned by that ref.

 It is still possible to write symbolic refs that are represented as
 symlinks (see core.preferSymlinkRefs), but that backwards-compatibility
 code was added in 2006(!) Maybe it's time to deprecate it. And maybe we
 should start working towards a future where any symlinks under refs
 cause git to complain.

I wouldn't mind seeing all of the symlink code go away, but I think it
is orthogonal to the problem I mentioned.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git gc removes all packs

2015-02-17 Thread Michael Haggerty
On 02/05/2015 09:03 PM, Jeff King wrote:
 On Thu, Feb 05, 2015 at 04:13:03PM +0100, Dmitry Neverov wrote:
 [...]
 One more thing about my setup: since git p4 promotes a use of a linear
 history I use a separate repository for another branch in perforce. In
 order to be able to cherry-pick between repositories I added this
 another repo objects dir as an alternate and also added a ref which is a
 symbolic link to a branch in another repo (so I don't have to do any
 fetches).
 
 You can't symlink refs like this. The loose refs in the filesystem may
 be migrated into the packed-refs file, at which point your symlink
 will be broken. That is a likely reason why git would not find any refs.
 
 So your setup will not ever work reliably.  But IMHO, it is a bug that
 git does not notice the broken symlink and abort an operation which is
 computing reachability in order to drop objects. As you noticed, it
 means a misconfiguration or filesystem error results in data loss.

There's a bunch of code in refs.c that is there explicitly for reading
loose references that are symlinks. If the link contents literally start
with refs/, then they are read and treated as a symbolic ref.
Otherwise, the symlink is just followed.

It is still possible to write symbolic refs that are represented as
symlinks (see core.preferSymlinkRefs), but that backwards-compatibility
code was added in 2006(!) Maybe it's time to deprecate it. And maybe we
should start working towards a future where any symlinks under refs
cause git to complain.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git gc removes all packs

2015-02-17 Thread Michael Haggerty
On 02/17/2015 05:55 PM, Jeff King wrote:
 On Tue, Feb 17, 2015 at 05:39:27PM +0100, Michael Haggerty wrote:
 
 You can't symlink refs like this. The loose refs in the filesystem may
 be migrated into the packed-refs file, at which point your symlink
 will be broken. That is a likely reason why git would not find any refs.

 So your setup will not ever work reliably.  But IMHO, it is a bug that
 git does not notice the broken symlink and abort an operation which is
 computing reachability in order to drop objects. As you noticed, it
 means a misconfiguration or filesystem error results in data loss.

 There's a bunch of code in refs.c that is there explicitly for reading
 loose references that are symlinks. If the link contents literally start
 with refs/, then they are read and treated as a symbolic ref.
 Otherwise, the symlink is just followed.
 
 Right, but we should be able to notice that:
 
   1. We found a symlink.
 
   2. We couldn't read it its ref value (because it's a broken link).
 
 I think we _do_ notice that at the lowest level, and set REF_ISBROKEN.
 But the problem is that the reachability code in prune and in
 pack-objects (triggered by repack -ad) uses for_each_ref, and not
 for_each_rawref. So they ignore broken refs rather than complaining,
 even though failing to read a ref may mean we could drop objects which
 were only mentioned by that ref.

Yes, this makes sense too. But my point was that sticking symlinks to
random files in your refs hierarchy is pretty questionable even *before*
the symlink gets broken. If we would warn the user as soon as we saw
such a thing, then the user's problem would never have advanced as far
as it did. Do you think that emitting warnings on *intact* symlinks is
too draconian?

 [...]

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git gc removes all packs

2015-02-17 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 On 02/17/2015 05:55 PM, Jeff King wrote:
 On Tue, Feb 17, 2015 at 05:39:27PM +0100, Michael Haggerty wrote:
 
 There's a bunch of code in refs.c that is there explicitly for reading
 loose references that are symlinks. If the link contents literally start
 with refs/, then they are read and treated as a symbolic ref.
 Otherwise, the symlink is just followed.
 ...
 Yes, this makes sense too. But my point was that sticking symlinks to
 random files in your refs hierarchy is pretty questionable even *before*
 the symlink gets broken. If we would warn the user as soon as we saw
 such a thing, then the user's problem would never have advanced as far
 as it did. Do you think that emitting warnings on *intact* symlinks is
 too draconian?

Do you mean that we would end up reading refs/heads/hold if the user
did this:

git rev-parse --verify HEAD -- precious
ln -s ../../../precious .git/refs/heads/hold

because that symbolic link does not begin with refs/, and is an
accident waiting to happen so we should forbid it in the longer
term and warning when we see it would be the first step?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git gc removes all packs

2015-02-17 Thread Michael Haggerty
On 02/17/2015 10:57 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 On 02/17/2015 05:55 PM, Jeff King wrote:
 On Tue, Feb 17, 2015 at 05:39:27PM +0100, Michael Haggerty wrote:

 There's a bunch of code in refs.c that is there explicitly for reading
 loose references that are symlinks. If the link contents literally start
 with refs/, then they are read and treated as a symbolic ref.
 Otherwise, the symlink is just followed.
 ...
 Yes, this makes sense too. But my point was that sticking symlinks to
 random files in your refs hierarchy is pretty questionable even *before*
 the symlink gets broken. If we would warn the user as soon as we saw
 such a thing, then the user's problem would never have advanced as far
 as it did. Do you think that emitting warnings on *intact* symlinks is
 too draconian?
 
 Do you mean that we would end up reading refs/heads/hold if the user
 did this:
 
 git rev-parse --verify HEAD -- precious
 ln -s ../../../precious .git/refs/heads/hold
 
 because that symbolic link does not begin with refs/,

Correct, you can do exactly that. The hold reference is resolvable and
listable using for-each-ref. But if I try to update it, the contents
of the precious file are overwritten. On the other hand, if I run
pack-refs, then the current value of the hold reference is moved to
packed-refs and the symlink is removed. This behavior is not sane.

 and is an
 accident waiting to happen so we should forbid it in the longer
 term and warning when we see it would be the first step?

Yes, I am proposing that approach, though if somebody can suggest a use
case I'm willing to be convinced otherwise. The only thing I can imagine
symlinks being useful for might be to temporarily create a fake repo,
run one or two specific known-safe commands, then delete the repo again.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git gc removes all packs

2015-02-05 Thread Jeff King
On Thu, Feb 05, 2015 at 04:13:03PM +0100, Dmitry Neverov wrote:

 I'm using git p4 for synchronization with perforce. Sometimes after 'git
 p4 rebase' git starts a garbage collection. When gc finishes a local
 repository contains no pack files only loose objects, so I have to
 re-import repository from perforce. It also doesn't contain a temporary
 pack git gc was creating.

It sounds like git didn't find any refs; it will pack only objects which
are reachable. Unreachable objects are either:

  1. Exploded into loose objects if the mtime on the pack they contain
 is less than 2 weeks old (and will eventually expire when they
 become 2 weeks old).

  2. Dropped completely if older than 2 weeks.

 One more thing about my setup: since git p4 promotes a use of a linear
 history I use a separate repository for another branch in perforce. In
 order to be able to cherry-pick between repositories I added this
 another repo objects dir as an alternate and also added a ref which is a
 symbolic link to a branch in another repo (so I don't have to do any
 fetches).

You can't symlink refs like this. The loose refs in the filesystem may
be migrated into the packed-refs file, at which point your symlink
will be broken. That is a likely reason why git would not find any refs.

So your setup will not ever work reliably.  But IMHO, it is a bug that
git does not notice the broken symlink and abort an operation which is
computing reachability in order to drop objects. As you noticed, it
means a misconfiguration or filesystem error results in data loss.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html