A co-worker asked me today how space could be saved when you have multiple checkouts of the same repository (at different revs) on the same machine. I said since these won't block-level de-duplicate well[1] one way to do this is with alternates.
However, once you have an existing clone I didn't know how to get the gains without a full re-clone, but I hadn't looked deeply into it. As it turns out I'm wrong about that, which I found when writing the following test-case which shows that it works: ( cd /tmp && rm -rf /tmp/git-{master,pu,pu-alt}.git && # Normal clones git clone --bare --no-tags --single-branch --branch master https://github.com/git/git.git /tmp/git-master.git && git clone --bare --no-tags --single-branch --branch pu https://github.com/git/git.git /tmp/git-pu.git && # An 'alternate' clone using 'master' objects from another repo git --bare init /tmp/git-pu-alt.git && for git in git-pu.git git-pu-alt.git do echo /tmp/git-master.git/objects >/tmp/$git/objects/info/alternates done && git -C git-pu-alt.git fetch --no-tags https://github.com/git/git.git pu:pu # Respective sizes, 'alternate' clone much smaller du -shc /tmp/git-*.git && # GC them all. Compacts the git-pu.git to git-pu-alt.git's size for repo in git-*.git do git -C $repo gc done && du -shc /tmp/git-*.git # Add another big history (GFW) to git-{pu,master}.git (in that order!) for repo in $(ls -d /tmp/git-*.git | sort -r) do git -C $repo fetch --no-tags https://github.com/git-for-windows/git master:master-gfw done && du -shc /tmp/git-*.git && # Another GC. The objects now in git-master.git will be de-duped by all for repo in git-*.git do git -C $repo gc done && du -shc /tmp/git-*.git ) This shows a scenario where we clone git.git at "master" and "pu" in different places. After clone the relevant sizes are: 108M /tmp/git-master.git 3.2M /tmp/git-pu-alt.git 109M /tmp/git-pu.git 219M total I.e. git-pu-alt.git is much smaller since it points via alternates to git-master.git, and the history of "pu" shares most of the objects with "master". But then how do you get those gains for git-pu.git? Turns out you just "git gc" 111M /tmp/git-master.git 2.1M /tmp/git-pu-alt.git 2.1M /tmp/git-pu.git 115M total This is the thing I was wrong about, in retrospect probably because I'd been putting PATH_TO_REPO in objects/info/alternates, but we actually need PATH_TO_REPO/objects, and "git gc" won't warn about this (or "git fsck"). Probably a good idea to patch that at some point, i.e. whine about paths in alternates that don't have objects, or at the very least those that don't exist. #leftoverbits Then when we fetch git-for-windows:master to all the repos they all grow by the amount git-for-windows has diverged: 144M /tmp/git-master.git 36M /tmp/git-pu-alt.git 36M /tmp/git-pu.git 214M total Note that the "sort -r" is critical here. If we fetched git-master.git first (at this point the alternate for git-pu*.git) we wouldn't get the duplication in the first place, but instead: 144M /tmp/git-master.git 2.1M /tmp/git-pu-alt.git 2.1M /tmp/git-pu.git 148M total This shows the importance of keeping such an 'alternate' repo up-to-date, i.e. we don't get the duplication in the first place, but regardless (this from a run with sort -r) a "git gc" will coalesce them: 131M /tmp/git-master.git 2.1M /tmp/git-pu-alt.git 2.2M /tmp/git-pu.git 135M total If you find this interesting make sure to read my https://public-inbox.org/git/87k1s3bomt....@evledraar.gmail.com/ and https://public-inbox.org/git/87in7nbi5b....@evledraar.gmail.com/ for the caveats, i.e. if this is something intended for users then no ref in the alternate can ever be rewound, that'll potentially result in repository corruption. 1. https://public-inbox.org/git/87bmhiykvw....@evledraar.gmail.com/ -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to git-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.