Re: Blobs not referenced by file (anymore) are not removed by GC
Jeff King p...@peff.net writes: ... I'd be happy to contribute a patch that gives 'gc' a flag to do the equivalent of: git reflog expire --expire=now --all git gc --prune=now --aggressive Maybe: git gc --purge Yeah, that is common enough that it might be worthwhile (you probably want --expire-unreachable in the reflog invocation, though). Also you would not want an unconditional --aggressive. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blobs not referenced by file (anymore) are not removed by GC
On 10 December 2014 at 16:07, Junio C Hamano gits...@pobox.com wrote: Jeff King p...@peff.net writes: git reflog expire --expire=now --all git gc --prune=now --aggressive Maybe: git gc --purge Yeah, that is common enough that it might be worthwhile (you probably want --expire-unreachable in the reflog invocation, though). Also you would not want an unconditional --aggressive. After a big rewrite deleting files the re-optimisation of --aggressive can make a big difference to packsize - for instance 1.2GB to 768MB in a test I just ran - but of course it is *much* slower, so I suspect you're right about not including it. I wasn't aware of the '--expire-unreachable=all' switch, though it seems like a 'milder' version of the '--expire=now' switch? - in that it would keep reflog entries if they haven't been changed, which is fair enough and compatible with the 'purge' goal. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blobs not referenced by file (anymore) are not removed by GC
On Mon, Dec 08, 2014 at 05:22:23PM +0100, Martin Scherer wrote: # invoke bfg --delete-folders something multiple times with different pattern. # try to cleanup git gc --aggressive --prune=now # big blobs still in history git fsck # no results git fsck --full --unreachable --dangling # no results Might you still have reflogs pointing to the objects? Try: git reflog expire --expire-unreachable=now --all I also don't know if BFG keeps backup refs around (filter-branch, for example, writes a copy of the original refs into refs/original; you would want to delete that if you're trying to slim down the repo). In general, you can see the on-disk size of the objects required for a particular ref with something like: size() { git rev-list --objects $@ | cut -d' ' -f1 | git cat-file --batch-check='%(objectsize:disk)' | perl -lne '$t += $_; END { print $t }' } # size of master branch size master # size of each ref on top of what is in the master branch git for-each-ref --format='%(refname)' | while read ref; do echo $(size master..$ref) $ref done | sort -rn Note that these sizes are somewhat approximate. We may store object X needed by one ref as a delta against Y used by another ref. The accounting shows X as tiny compared to Y. And then a repack may find the delta in the opposite direction. But if you're talking about rewriting history to drop a bunch of gigantic objects, the output of the final loop is a good way to see which refs are still referring to the old history. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blobs not referenced by file (anymore) are not removed by GC
On 9 December 2014 at 14:14, Jeff King p...@peff.net wrote: On Mon, Dec 08, 2014 at 05:22:23PM +0100, Martin Scherer wrote: # invoke bfg --delete-folders something multiple times with different pattern. # try to cleanup git gc --aggressive --prune=now # big blobs still in history git fsck # no results git fsck --full --unreachable --dangling # no results Might you still have reflogs pointing to the objects? Try: git reflog expire --expire-unreachable=now --all Yeah, we figured that's what it was! https://github.com/rtyley/bfg-repo-cleaner/issues/62#issuecomment-66152559 I also don't know if BFG keeps backup refs around (filter-branch, for example, writes a copy of the original refs into refs/original; you would want to delete that if you're trying to slim down the repo). The BFG reports the ref changes to the command line (and outputs a full list of changed object-ids in repo-name.git.bfg-report/[datetime]/object-id-map.old-new.txt) but doesn't keep refs (like refs/original) around because that would get in the way of the BFG's explicit intended use-case of removing unwanted data. Thanks for the object-size checking scripts, very useful. Roberto -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blobs not referenced by file (anymore) are not removed by GC
On Tuesday, 9 December 2014, Jeff King p...@peff.net wrote: I actually think filter-branch's refs/original is a bit outdated at this point. The information is there in the reflogs already, and dealing with refs/original often causes confusion in my experience. It could probably use a git filter-branch --restore or something to switch each $ref to $ref@{1} (after making sure that the reflog entry was from filter-branch, of course). Yeah, I'd agree that refs/original can cause confusion. Not that I expect you to want to work on filter-branch. :) But maybe food for thought for a BFG feature. I haven't heard much demand for a recover/restore feature on the BFG (I think by the time people get to the BFG, they're pretty sure they want to go ahead with the procedure!) but I'll bear it in mind. Mind you, to make the post-rewrite clean-up easier, I'd be happy to contribute a patch that gives 'gc' a flag to do the equivalent of: git reflog expire --expire=now --all git gc --prune=now --aggressive Maybe: git gc --purge ?? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blobs not referenced by file (anymore) are not removed by GC
On Tue, Dec 09, 2014 at 10:15:31PM +, Roberto Tyley wrote: Not that I expect you to want to work on filter-branch. :) But maybe food for thought for a BFG feature. I haven't heard much demand for a recover/restore feature on the BFG (I think by the time people get to the BFG, they're pretty sure they want to go ahead with the procedure!) but I'll bear it in mind. Mind you, to make the post-rewrite clean-up easier, I'd be happy to contribute a patch that gives 'gc' a flag to do the equivalent of: git reflog expire --expire=now --all git gc --prune=now --aggressive Maybe: git gc --purge Yeah, that is common enough that it might be worthwhile (you probably want --expire-unreachable in the reflog invocation, though). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Blobs not referenced by file (anymore) are not removed by GC
Hi, after using BFG on a repo given certain directory globs, all of those files(names) are gone from history, but can not be collected by garbage collection anymore. So the blobs of the underlying files are not deleted and only the file names are not associated with the blob anymore. I wonder, if I discovered a bug (at least in bfg). But I expect git to discover that this blobs are not used in any way (so they have to associated to something right?) # invoke bfg --delete-folders something multiple times with different pattern. # try to cleanup git gc --aggressive --prune=now # big blobs still in history git fsck # no results git fsck --full --unreachable --dangling # no results to verify if the blobs are still there, see the output of git gc git verify-pack -v .git/objects/pack/pack-*.idx | egrep ^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$ | sort -k 3 -n -r bigobjects .txt head bigobjects.txt # outputs 9451427d7335395779b91864418630d2f0af780a blob 7895212 1869047 7657491 Also if bfg is being told to remove the biggest blob (bfg -B 1) with no-blob-protection, it does not succeed in removing it. --- output of bfg -B 1 Found 1 blob ids for large blobs - biggest=7895212 smallest=7895212 BFG aborting: No refs to update - no dirty commits found?? --- The repo can be found here. https://github.com/marscher/stallone_stale_objects I will restart all over to cleanup the history, but I guess this might be interesting for git developers. Best, Martin -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blobs not referenced by file (anymore) are not removed by GC
Hi Martin, I'm the developer of the BFG - I'd guess that there probably isn't a bug for Git developers here, so you might want to open one or more issues at https://github.com/rtyley/bfg-repo-cleaner/issues, where I'd be happy to take a look. best regards, Roberto On 8 Dec 2014 16:35, Martin Scherer m.sche...@fu-berlin.de wrote: Hi, after using BFG on a repo given certain directory globs, all of those files(names) are gone from history, but can not be collected by garbage collection anymore. So the blobs of the underlying files are not deleted and only the file names are not associated with the blob anymore. I wonder, if I discovered a bug (at least in bfg). But I expect git to discover that this blobs are not used in any way (so they have to associated to something right?) # invoke bfg --delete-folders something multiple times with different pattern. # try to cleanup git gc --aggressive --prune=now # big blobs still in history git fsck # no results git fsck --full --unreachable --dangling # no results to verify if the blobs are still there, see the output of git gc git verify-pack -v .git/objects/pack/pack-*.idx | egrep ^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$ | sort -k 3 -n -r bigobjects .txt head bigobjects.txt # outputs 9451427d7335395779b91864418630d2f0af780a blob 7895212 1869047 7657491 Also if bfg is being told to remove the biggest blob (bfg -B 1) with no-blob-protection, it does not succeed in removing it. --- output of bfg -B 1 Found 1 blob ids for large blobs - biggest=7895212 smallest=7895212 BFG aborting: No refs to update - no dirty commits found?? --- The repo can be found here. https://github.com/marscher/stallone_stale_objects I will restart all over to cleanup the history, but I guess this might be interesting for git developers. Best, Martin -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html