Re: Big repo not shrinking on repack or gc?

2015-01-15 Thread Bryan Turner
On Thu, Jan 15, 2015 at 6:43 PM, Andreas Krey a.k...@gmx.de wrote:
 On Thu, 15 Jan 2015 18:05:46 +, Bryan Turner wrote:
 ...

 They do. So it seems it was forked once upon a time, but...

 /opt/apps/atlassian/stash-data/shared/data/repositories $ grep '' 
 */objects/info/alternates
 158/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/20/objects
 45/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/33/objects
 93/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/91/objects

 ...there is no trace of a fork still existing (the repo in question is 143).

Yes, the system doesn't currently detect when a repository becomes
un-forked because it's not a common use case.

At this point I think we should probably take this off-list. You can
either e-mail me directly (bturner at atlassian dot com), or, better
still, raise a ticket on support.atlassian.com. Either way I'll work
with you directly to un-fork the repository on disk and allow it to
clean itself up.


 Andreas

 --
 Totally trivial. Famous last words.
 From: Linus Torvalds torvalds@*.org
 Date: Fri, 22 Jan 2010 07:29:21 -0800
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Big repo not shrinking on repack or gc?

2015-01-14 Thread Bryan Turner
On Thu, Jan 15, 2015 at 4:24 AM, Junio C Hamano gits...@pobox.com wrote:
 Andreas Krey a.k...@gmx.de writes:

 On Wed, 14 Jan 2015 07:49:36 +, Jeff King wrote:
 ...
 You don't need the -f here. Just git repack -ad should be enough
 (and the -f probably makes it _way_ slower).

 Indeed, factor four.

 However, my expectation is that a repack -ad will remove all the
 old pack files, as what is in there is either referenced and put
 into the new pack, or dropped = there should be a single pack file
 afterwards.

 This is not the case. :-( (Done only with 1.8.2 due to
 lack of compilers for this box.)

 Guess in the dark: ls -l .git/objects/pack
 Do you see any .keep files?

I'm one of the Stash developers and just noticed this thread. If the
repository in question has been forked via Stash there likely _will_
be .keep files. Stash uses alternates for forks, so it's possible, by
deleting those kept packs and pruning objects (which you've already
done I see) that you will corrupt, or have already corrupted, some
number of the forks. (At the moment Stash packs garbage into a dead
pack which it flags with a .keep, to ensure forks don't lose access
to objects that once existed upstream that they still reference.)

 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Big repo not shrinking on repack or gc?

2015-01-14 Thread Andreas Krey
On Wed, 14 Jan 2015 07:49:36 +, Jeff King wrote:
...
 You don't need the -f here. Just git repack -ad should be enough
 (and the -f probably makes it _way_ slower).

Indeed, factor four.

However, my expectation is that a repack -ad will remove all the
old pack files, as what is in there is either referenced and put
into the new pack, or dropped = there should be a single pack file
afterwards.

This is not the case. :-( (Done only with 1.8.2 due to
lack of compilers for this box.)

Andreas

-- 
Totally trivial. Famous last words.
From: Linus Torvalds torvalds@*.org
Date: Fri, 22 Jan 2010 07:29:21 -0800
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Big repo not shrinking on repack or gc?

2015-01-14 Thread Andreas Krey
On Thu, 15 Jan 2015 18:05:46 +, Bryan Turner wrote:
...
 No, Stash will only do that in a repository which has been forked. In
 any non-forked repository, Stash does not interact with garbage
 collection in any way. Auto GC is left enabled, and all pruning
 settings are left at their defaults. The default pruning interval is
 two weeks, so if your development approach is rebase-heavy you may
 need to adjust them.
 
 What are the contents of some of those .keep files? If they're written
 by Stash they contain a message saying so. (GENERATED BY ATLASSIAN
 STASH - DO NOT REMOVE)

They do. So it seems it was forked once upon a time, but...

/opt/apps/atlassian/stash-data/shared/data/repositories $ grep '' 
*/objects/info/alternates
158/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/20/objects
45/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/33/objects
93/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/91/objects

...there is no trace of a fork still existing (the repo in question is 143).

Andreas

-- 
Totally trivial. Famous last words.
From: Linus Torvalds torvalds@*.org
Date: Fri, 22 Jan 2010 07:29:21 -0800
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Big repo not shrinking on repack or gc?

2015-01-14 Thread Andreas Krey
On Thu, 15 Jan 2015 12:23:00 +, Bryan Turner wrote:
...
  Guess in the dark: ls -l .git/objects/pack
  Do you see any .keep files?

Lots of. :-(

 I'm one of the Stash developers and just noticed this thread. If the
 repository in question has been forked via Stash there likely _will_
 be .keep files. Stash uses alternates for forks, so it's possible, by
 deleting those kept packs and pruning objects (which you've already
 done I see) that you will corrupt, or have already corrupted, some
 number of the forks.

There are a few forks in this stash instance, but the repository in
question is neither the source nor the destination of any.

So, git seems to be mostly out of the equation now (gc and repack
apparently doing what they are supposed to do), and the question
moves to 'how can stash let such a repo grow to that size'.


 (At the moment Stash packs garbage into a dead
 pack which it flags with a .keep, to ensure forks don't lose access
 to objects that once existed upstream that they still reference.)

Does it do so in any case even if there is no actual fork? That would
explain a lot - we are daily (force-)pushing new commit in there (and
potentially big ones) that become garbage the next day, and should
be cleaned up rather fast.

(We're pulling them into another non-stash repo for longer-term keeping -
these are backups of dev repos in the form of git stash commits including
untracked files.)

Andreas

-- 
Totally trivial. Famous last words.
From: Linus Torvalds torvalds@*.org
Date: Fri, 22 Jan 2010 07:29:21 -0800
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Big repo not shrinking on repack or gc?

2015-01-14 Thread Bryan Turner
On Thu, Jan 15, 2015 at 5:38 PM, Andreas Krey a.k...@gmx.de wrote:
 On Thu, 15 Jan 2015 12:23:00 +, Bryan Turner wrote:
 ...
  Guess in the dark: ls -l .git/objects/pack
  Do you see any .keep files?

 Lots of. :-(

 I'm one of the Stash developers and just noticed this thread. If the
 repository in question has been forked via Stash there likely _will_
 be .keep files. Stash uses alternates for forks, so it's possible, by
 deleting those kept packs and pruning objects (which you've already
 done I see) that you will corrupt, or have already corrupted, some
 number of the forks.

 There are a few forks in this stash instance, but the repository in
 question is neither the source nor the destination of any.

 So, git seems to be mostly out of the equation now (gc and repack
 apparently doing what they are supposed to do), and the question
 moves to 'how can stash let such a repo grow to that size'.


 (At the moment Stash packs garbage into a dead
 pack which it flags with a .keep, to ensure forks don't lose access
 to objects that once existed upstream that they still reference.)

 Does it do so in any case even if there is no actual fork? That would
 explain a lot - we are daily (force-)pushing new commit in there (and
 potentially big ones) that become garbage the next day, and should
 be cleaned up rather fast.

No, Stash will only do that in a repository which has been forked. In
any non-forked repository, Stash does not interact with garbage
collection in any way. Auto GC is left enabled, and all pruning
settings are left at their defaults. The default pruning interval is
two weeks, so if your development approach is rebase-heavy you may
need to adjust them.

What are the contents of some of those .keep files? If they're written
by Stash they contain a message saying so. (GENERATED BY ATLASSIAN
STASH - DO NOT REMOVE)


 (We're pulling them into another non-stash repo for longer-term keeping -
 these are backups of dev repos in the form of git stash commits including
 untracked files.)

 Andreas

 --
 Totally trivial. Famous last words.
 From: Linus Torvalds torvalds@*.org
 Date: Fri, 22 Jan 2010 07:29:21 -0800
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Big repo not shrinking on repack or gc?

2015-01-14 Thread Andreas Krey
On Wed, 14 Jan 2015 15:39:46 +, Andreas Krey wrote:
...
 This is not the case. :-( (Done only with 1.8.2 due to
 lack of compilers for this box.)

Neither for current git (copied repo to other machine)
There is one new pack file of a plausible size (25G),
and 65G worth of old packfiles.

Andreas

-- 
Totally trivial. Famous last words.
From: Linus Torvalds torvalds@*.org
Date: Fri, 22 Jan 2010 07:29:21 -0800
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Big repo not shrinking on repack or gc?

2015-01-14 Thread Junio C Hamano
Andreas Krey a.k...@gmx.de writes:

 On Wed, 14 Jan 2015 07:49:36 +, Jeff King wrote:
 ...
 You don't need the -f here. Just git repack -ad should be enough
 (and the -f probably makes it _way_ slower).

 Indeed, factor four.

 However, my expectation is that a repack -ad will remove all the
 old pack files, as what is in there is either referenced and put
 into the new pack, or dropped = there should be a single pack file
 afterwards.

 This is not the case. :-( (Done only with 1.8.2 due to
 lack of compilers for this box.)

Guess in the dark: ls -l .git/objects/pack
Do you see any .keep files?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Big repo not shrinking on repack or gc?

2015-01-14 Thread Andreas Krey
On Wed, 14 Jan 2015 07:49:36 +, Jeff King wrote:
 On Wed, Jan 14, 2015 at 12:51:30PM +0100, Andreas Krey wrote:
 
  I have a repo here that is 130G, but when I clone --mirror it, the result
  is only 25G big.  Because of the --mirror I don't think that I missed
  any refs that keep objects only in the source repo.
 
 Perhaps some objects are mentioned by reflogs, but not by the refs? They
 would not be transferred as part of a clone. Try:
 
   git rev-list --objects --all | cut -d' ' -f1 | sort reachable
   git rev-list --objects --reflog | cut -d' ' -f1 | sort reflogs

Actually, the output of 'git rev-list --objects --reflog' is empty, and
there isn't even a reflog (or similar) directory. (This is a bare repo
inside atlass. stash.)

...
  I already tried 'git repack -fad' and 'git gc' to shrink the original repo,
 
 You don't need the -f here. Just git repack -ad should be enough
 (and the -f probably makes it _way_ slower).

Right, the -f is an old workaround for old jgits in another repo.

Apparently, part of the trick is --prune=all or similar on 'git gc',
to get rid of the loose objects faster. That got a copy of the repo
down to around 70G - still way to go.

Andreas

-- 
Totally trivial. Famous last words.
From: Linus Torvalds torvalds@*.org
Date: Fri, 22 Jan 2010 07:29:21 -0800
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Big repo not shrinking on repack or gc?

2015-01-14 Thread Andreas Krey
Hi everybody,

I have a repo here that is 130G, but when I clone --mirror it, the result
is only 25G big.  Because of the --mirror I don't think that I missed
any refs that keep objects only in the source repo.

I already tried 'git repack -fad' and 'git gc' to shrink the original repo,
but it only shaved off 3G, and there are a lot of loose objects and old
pack files that I simply don't expect to be there after a repack.

Shouldn't 'git gc' (even without --aggressive) or a 'repack -fad' remove
those redundant objects and packs?

How to clean this up? (Additional problem: I don't have enough space
to run a repack anymore.)

Andreas

-- 
Totally trivial. Famous last words.
From: Linus Torvalds torvalds@*.org
Date: Fri, 22 Jan 2010 07:29:21 -0800
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Big repo not shrinking on repack or gc?

2015-01-14 Thread Jeff King
On Wed, Jan 14, 2015 at 12:51:30PM +0100, Andreas Krey wrote:

 I have a repo here that is 130G, but when I clone --mirror it, the result
 is only 25G big.  Because of the --mirror I don't think that I missed
 any refs that keep objects only in the source repo.

Perhaps some objects are mentioned by reflogs, but not by the refs? They
would not be transferred as part of a clone. Try:

  git rev-list --objects --all | cut -d' ' -f1 | sort reachable
  git rev-list --objects --reflog | cut -d' ' -f1 | sort reflogs
  comm -13 reachable reflogs |
  git cat-file --batch-check='%(objectsize:disk)' |
  perl -lne '$total += $_; END { print $total }'

That should print the size, in bytes, that reflog-only objects are using
on disk.  You can use git reflog expire --expire-unreachable=now --all
to get rid of them (and then repack).

 I already tried 'git repack -fad' and 'git gc' to shrink the original repo,

You don't need the -f here. Just git repack -ad should be enough
(and the -f probably makes it _way_ slower).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html