Re: Big repo not shrinking on repack or gc?
On Thu, Jan 15, 2015 at 6:43 PM, Andreas Krey a.k...@gmx.de wrote: On Thu, 15 Jan 2015 18:05:46 +, Bryan Turner wrote: ... They do. So it seems it was forked once upon a time, but... /opt/apps/atlassian/stash-data/shared/data/repositories $ grep '' */objects/info/alternates 158/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/20/objects 45/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/33/objects 93/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/91/objects ...there is no trace of a fork still existing (the repo in question is 143). Yes, the system doesn't currently detect when a repository becomes un-forked because it's not a common use case. At this point I think we should probably take this off-list. You can either e-mail me directly (bturner at atlassian dot com), or, better still, raise a ticket on support.atlassian.com. Either way I'll work with you directly to un-fork the repository on disk and allow it to clean itself up. Andreas -- Totally trivial. Famous last words. From: Linus Torvalds torvalds@*.org Date: Fri, 22 Jan 2010 07:29:21 -0800 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Big repo not shrinking on repack or gc?
On Thu, Jan 15, 2015 at 4:24 AM, Junio C Hamano gits...@pobox.com wrote: Andreas Krey a.k...@gmx.de writes: On Wed, 14 Jan 2015 07:49:36 +, Jeff King wrote: ... You don't need the -f here. Just git repack -ad should be enough (and the -f probably makes it _way_ slower). Indeed, factor four. However, my expectation is that a repack -ad will remove all the old pack files, as what is in there is either referenced and put into the new pack, or dropped = there should be a single pack file afterwards. This is not the case. :-( (Done only with 1.8.2 due to lack of compilers for this box.) Guess in the dark: ls -l .git/objects/pack Do you see any .keep files? I'm one of the Stash developers and just noticed this thread. If the repository in question has been forked via Stash there likely _will_ be .keep files. Stash uses alternates for forks, so it's possible, by deleting those kept packs and pruning objects (which you've already done I see) that you will corrupt, or have already corrupted, some number of the forks. (At the moment Stash packs garbage into a dead pack which it flags with a .keep, to ensure forks don't lose access to objects that once existed upstream that they still reference.) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Big repo not shrinking on repack or gc?
On Wed, 14 Jan 2015 07:49:36 +, Jeff King wrote: ... You don't need the -f here. Just git repack -ad should be enough (and the -f probably makes it _way_ slower). Indeed, factor four. However, my expectation is that a repack -ad will remove all the old pack files, as what is in there is either referenced and put into the new pack, or dropped = there should be a single pack file afterwards. This is not the case. :-( (Done only with 1.8.2 due to lack of compilers for this box.) Andreas -- Totally trivial. Famous last words. From: Linus Torvalds torvalds@*.org Date: Fri, 22 Jan 2010 07:29:21 -0800 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Big repo not shrinking on repack or gc?
On Thu, 15 Jan 2015 18:05:46 +, Bryan Turner wrote: ... No, Stash will only do that in a repository which has been forked. In any non-forked repository, Stash does not interact with garbage collection in any way. Auto GC is left enabled, and all pruning settings are left at their defaults. The default pruning interval is two weeks, so if your development approach is rebase-heavy you may need to adjust them. What are the contents of some of those .keep files? If they're written by Stash they contain a message saying so. (GENERATED BY ATLASSIAN STASH - DO NOT REMOVE) They do. So it seems it was forked once upon a time, but... /opt/apps/atlassian/stash-data/shared/data/repositories $ grep '' */objects/info/alternates 158/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/20/objects 45/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/33/objects 93/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/91/objects ...there is no trace of a fork still existing (the repo in question is 143). Andreas -- Totally trivial. Famous last words. From: Linus Torvalds torvalds@*.org Date: Fri, 22 Jan 2010 07:29:21 -0800 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Big repo not shrinking on repack or gc?
On Thu, 15 Jan 2015 12:23:00 +, Bryan Turner wrote: ... Guess in the dark: ls -l .git/objects/pack Do you see any .keep files? Lots of. :-( I'm one of the Stash developers and just noticed this thread. If the repository in question has been forked via Stash there likely _will_ be .keep files. Stash uses alternates for forks, so it's possible, by deleting those kept packs and pruning objects (which you've already done I see) that you will corrupt, or have already corrupted, some number of the forks. There are a few forks in this stash instance, but the repository in question is neither the source nor the destination of any. So, git seems to be mostly out of the equation now (gc and repack apparently doing what they are supposed to do), and the question moves to 'how can stash let such a repo grow to that size'. (At the moment Stash packs garbage into a dead pack which it flags with a .keep, to ensure forks don't lose access to objects that once existed upstream that they still reference.) Does it do so in any case even if there is no actual fork? That would explain a lot - we are daily (force-)pushing new commit in there (and potentially big ones) that become garbage the next day, and should be cleaned up rather fast. (We're pulling them into another non-stash repo for longer-term keeping - these are backups of dev repos in the form of git stash commits including untracked files.) Andreas -- Totally trivial. Famous last words. From: Linus Torvalds torvalds@*.org Date: Fri, 22 Jan 2010 07:29:21 -0800 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Big repo not shrinking on repack or gc?
On Thu, Jan 15, 2015 at 5:38 PM, Andreas Krey a.k...@gmx.de wrote: On Thu, 15 Jan 2015 12:23:00 +, Bryan Turner wrote: ... Guess in the dark: ls -l .git/objects/pack Do you see any .keep files? Lots of. :-( I'm one of the Stash developers and just noticed this thread. If the repository in question has been forked via Stash there likely _will_ be .keep files. Stash uses alternates for forks, so it's possible, by deleting those kept packs and pruning objects (which you've already done I see) that you will corrupt, or have already corrupted, some number of the forks. There are a few forks in this stash instance, but the repository in question is neither the source nor the destination of any. So, git seems to be mostly out of the equation now (gc and repack apparently doing what they are supposed to do), and the question moves to 'how can stash let such a repo grow to that size'. (At the moment Stash packs garbage into a dead pack which it flags with a .keep, to ensure forks don't lose access to objects that once existed upstream that they still reference.) Does it do so in any case even if there is no actual fork? That would explain a lot - we are daily (force-)pushing new commit in there (and potentially big ones) that become garbage the next day, and should be cleaned up rather fast. No, Stash will only do that in a repository which has been forked. In any non-forked repository, Stash does not interact with garbage collection in any way. Auto GC is left enabled, and all pruning settings are left at their defaults. The default pruning interval is two weeks, so if your development approach is rebase-heavy you may need to adjust them. What are the contents of some of those .keep files? If they're written by Stash they contain a message saying so. (GENERATED BY ATLASSIAN STASH - DO NOT REMOVE) (We're pulling them into another non-stash repo for longer-term keeping - these are backups of dev repos in the form of git stash commits including untracked files.) Andreas -- Totally trivial. Famous last words. From: Linus Torvalds torvalds@*.org Date: Fri, 22 Jan 2010 07:29:21 -0800 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Big repo not shrinking on repack or gc?
On Wed, 14 Jan 2015 15:39:46 +, Andreas Krey wrote: ... This is not the case. :-( (Done only with 1.8.2 due to lack of compilers for this box.) Neither for current git (copied repo to other machine) There is one new pack file of a plausible size (25G), and 65G worth of old packfiles. Andreas -- Totally trivial. Famous last words. From: Linus Torvalds torvalds@*.org Date: Fri, 22 Jan 2010 07:29:21 -0800 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Big repo not shrinking on repack or gc?
Andreas Krey a.k...@gmx.de writes: On Wed, 14 Jan 2015 07:49:36 +, Jeff King wrote: ... You don't need the -f here. Just git repack -ad should be enough (and the -f probably makes it _way_ slower). Indeed, factor four. However, my expectation is that a repack -ad will remove all the old pack files, as what is in there is either referenced and put into the new pack, or dropped = there should be a single pack file afterwards. This is not the case. :-( (Done only with 1.8.2 due to lack of compilers for this box.) Guess in the dark: ls -l .git/objects/pack Do you see any .keep files? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Big repo not shrinking on repack or gc?
On Wed, 14 Jan 2015 07:49:36 +, Jeff King wrote: On Wed, Jan 14, 2015 at 12:51:30PM +0100, Andreas Krey wrote: I have a repo here that is 130G, but when I clone --mirror it, the result is only 25G big. Because of the --mirror I don't think that I missed any refs that keep objects only in the source repo. Perhaps some objects are mentioned by reflogs, but not by the refs? They would not be transferred as part of a clone. Try: git rev-list --objects --all | cut -d' ' -f1 | sort reachable git rev-list --objects --reflog | cut -d' ' -f1 | sort reflogs Actually, the output of 'git rev-list --objects --reflog' is empty, and there isn't even a reflog (or similar) directory. (This is a bare repo inside atlass. stash.) ... I already tried 'git repack -fad' and 'git gc' to shrink the original repo, You don't need the -f here. Just git repack -ad should be enough (and the -f probably makes it _way_ slower). Right, the -f is an old workaround for old jgits in another repo. Apparently, part of the trick is --prune=all or similar on 'git gc', to get rid of the loose objects faster. That got a copy of the repo down to around 70G - still way to go. Andreas -- Totally trivial. Famous last words. From: Linus Torvalds torvalds@*.org Date: Fri, 22 Jan 2010 07:29:21 -0800 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Big repo not shrinking on repack or gc?
Hi everybody, I have a repo here that is 130G, but when I clone --mirror it, the result is only 25G big. Because of the --mirror I don't think that I missed any refs that keep objects only in the source repo. I already tried 'git repack -fad' and 'git gc' to shrink the original repo, but it only shaved off 3G, and there are a lot of loose objects and old pack files that I simply don't expect to be there after a repack. Shouldn't 'git gc' (even without --aggressive) or a 'repack -fad' remove those redundant objects and packs? How to clean this up? (Additional problem: I don't have enough space to run a repack anymore.) Andreas -- Totally trivial. Famous last words. From: Linus Torvalds torvalds@*.org Date: Fri, 22 Jan 2010 07:29:21 -0800 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Big repo not shrinking on repack or gc?
On Wed, Jan 14, 2015 at 12:51:30PM +0100, Andreas Krey wrote: I have a repo here that is 130G, but when I clone --mirror it, the result is only 25G big. Because of the --mirror I don't think that I missed any refs that keep objects only in the source repo. Perhaps some objects are mentioned by reflogs, but not by the refs? They would not be transferred as part of a clone. Try: git rev-list --objects --all | cut -d' ' -f1 | sort reachable git rev-list --objects --reflog | cut -d' ' -f1 | sort reflogs comm -13 reachable reflogs | git cat-file --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' That should print the size, in bytes, that reflog-only objects are using on disk. You can use git reflog expire --expire-unreachable=now --all to get rid of them (and then repack). I already tried 'git repack -fad' and 'git gc' to shrink the original repo, You don't need the -f here. Just git repack -ad should be enough (and the -f probably makes it _way_ slower). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html