On Thursday, November 7, 2013 11:14:31 PM UTC+1, Gergely Polonkai wrote:

> Hello,
>
> I know it was on topic several times before, but today this problem also 
> came to me.
>
> I have a project tracked by git, which contained source code for both a 
> binary and a library. The library had several relatively large (~30 MB 
> each, ~400 MX total) data files, which were generated by an external 
> program, and will never be modified later.
>
> As the project matured, it turned out that the library can be a standalone 
> product, so we have moved data/ into lib-src/, and “exported” that 
> directory to a separate repository with git-subtree. We don’t plan to merge 
> them together any more. This, however, made the checkout of the binary’s 
> repository a bit slow, as the index still holds these files.
>
> Some additional information: before the subtree operation, the binary was 
> more like a test suite for the library, only the last few commits may 
> contain relevant (e.g. code that is still in the binary) changes to both 
> the binary and the lib.
>
> My question is: is it a good idea to remove these now quasi-unused files 
> from the index?
>

Version control systems track every version of very file that was ever in 
the repository. If you want to shrink the repository by removing stuff, you 
have to go through history and erase all notions that the file ever 
existed, in order to actually win back some space.

In many cases, the large files should never have been added in the first 
place, so it's safe to remove them from history. In other cases, the files 
might have been important for the consistency of things at a certain time, 
and keeping them in history has value.

It's hard to say what aspects of history is important for you to keep from 
an external point of view.

Once you decide that you want to remove files from history, I do recommend 
trying out the BFG repo cleaner, as it is quite user-friendly, and does the 
normal use-cases really well: http://rtyley.github.io/bfg-repo-cleaner/

If that doesn't get you where you want, the standard Git tool for such 
things is git filter-branch in its various modes and 
uses: https://www.kernel.org/pub/software/scm/git/docs/git-filter-branch.html 
(google for more examples and how-to's).

Note that once you rewrite history, collaborators will have to re-clone the 
newly rewritten repository. Also make sure to take backups before you start 
rewriting history.

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to