On Thursday, November 7, 2013 11:14:31 PM UTC+1, Gergely Polonkai wrote: > Hello, > > I know it was on topic several times before, but today this problem also > came to me. > > I have a project tracked by git, which contained source code for both a > binary and a library. The library had several relatively large (~30 MB > each, ~400 MX total) data files, which were generated by an external > program, and will never be modified later. > > As the project matured, it turned out that the library can be a standalone > product, so we have moved data/ into lib-src/, and “exported” that > directory to a separate repository with git-subtree. We don’t plan to merge > them together any more. This, however, made the checkout of the binary’s > repository a bit slow, as the index still holds these files. > > Some additional information: before the subtree operation, the binary was > more like a test suite for the library, only the last few commits may > contain relevant (e.g. code that is still in the binary) changes to both > the binary and the lib. > > My question is: is it a good idea to remove these now quasi-unused files > from the index? >
Version control systems track every version of very file that was ever in the repository. If you want to shrink the repository by removing stuff, you have to go through history and erase all notions that the file ever existed, in order to actually win back some space. In many cases, the large files should never have been added in the first place, so it's safe to remove them from history. In other cases, the files might have been important for the consistency of things at a certain time, and keeping them in history has value. It's hard to say what aspects of history is important for you to keep from an external point of view. Once you decide that you want to remove files from history, I do recommend trying out the BFG repo cleaner, as it is quite user-friendly, and does the normal use-cases really well: http://rtyley.github.io/bfg-repo-cleaner/ If that doesn't get you where you want, the standard Git tool for such things is git filter-branch in its various modes and uses: https://www.kernel.org/pub/software/scm/git/docs/git-filter-branch.html (google for more examples and how-to's). Note that once you rewrite history, collaborators will have to re-clone the newly rewritten repository. Also make sure to take backups before you start rewriting history. -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.