Re: Local clones aka forks disk size optimization
On Fri, Nov 16, 2012 at 11:34 PM, Enrico Weigelt enrico.weig...@vnc.biz wrote: Provide one main clone which is bare, pulls automatically, and is there to stay (no pruning), so that all others can use that as a reliable alternates source. The problem here, IMHO, is the assumption, that the main repo will never be cleaned up. But what to do if you dont wanna let it grow forever ? That's not the only problem. I believe you only get the savings when the main repo gets the commits first. Which is probably ok most of the time but it's worth mentioning. hmm, distributed GC is a tricky problem. Except for one little issue (see other thread, subject line cloning a namespace downloads all the objects), namespaces appear to do everything we want in terms of the typical use cases for alternates, and/or 'git clone -l', at least on the server side. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Local clones aka forks disk size optimization
Hi, That's not the only problem. I believe you only get the savings when the main repo gets the commits first. Which is probably ok most of the time but it's worth mentioning. Well, the saving will just be deferred to the point where the commit finally went to the main repo and downstreams are gc'ed. hmm, distributed GC is a tricky problem. Except for one little issue (see other thread, subject line cloning a namespace downloads all the objects), namespaces appear to do everything we want in terms of the typical use cases for alternates, and/or 'git clone -l', at least on the server side. hmm, not sure about the actual internals, but that namespace filtering should work in a way that local clone should never see (or consider) remote refs that are outside of the requested namespace. Perhaps that should be handled entirely on server side, so all called commands treat these refs as nonexisting. By the way: what happens if one tries to clone from an broken repo (which has several refs pointing to nonexisting objects ? cu -- Mit freundlichen Grüßen / Kind regards Enrico Weigelt VNC - Virtual Network Consult GmbH Head Of Development Pariser Platz 4a, D-10117 Berlin Tel.: +49 (30) 3464615-20 Fax: +49 (30) 3464615-59 enrico.weig...@vnc.biz; www.vnc.de -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Local clones aka forks disk size optimization
2012/11/15 Javier Domingo javier...@gmail.com Is there any way to avoid this? I mean, can something be done in git, that it checks for (when pulling) the same objects in the other forks? I've been using git-new-workdir (https://github.com/git/git/blob/master/contrib/workdir/git-new-workdir) for a similar problem. Maybe that's what you're searching? Joerg. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Local clones aka forks disk size optimization
Sitaram Chamarty venit, vidit, dixit 15.11.2012 04:44: On Thu, Nov 15, 2012 at 7:04 AM, Andrew Ardill andrew.ard...@gmail.com wrote: On 15 November 2012 12:15, Javier Domingo javier...@gmail.com wrote: Hi Andrew, Doing this would require I got tracked which one comes from which. So it would imply some logic (and db) over it. With the hardlinking way, it wouldn't require anything. The idea is that you don't have to do anything else in the server. I understand that it would be imposible to do it for windows users (but using cygwin), but for *nix ones yes... Javier Domingo Paraphrasing from git-clone(1): When cloning a repository, if the source repository is specified with /path/to/repo syntax, the default is to clone the repository by making a copy of HEAD and everything under objects and refs directories. The files under .git/objects/ directory are hardlinked to save space when possible. To force copying instead of hardlinking (which may be desirable if you are trying to make a back-up of your repository) --no-hardlinks can be used. So hardlinks should be used where possible, and if they are not try upgrading Git. I think that covers all the use cases you have? I am not sure it does. My understanding is this: 'git clone -l' saves space on the initial clone, but subsequent pushes end up with the same objects duplicated across all the forks (assuming most of the forks keep up with some canonical repo). The alternates mechanism can give you ongoing savings (as long as you push to the main repo first), but it is dangerous, in the words of the git-clone manpage. You have to be confident no one will delete a ref from the main repo and then do a gc or let it auto-gc. He's looking for something that addresses both these issues. As an additional idea, I suspect this is what the namespaces feature was created for, but I am not sure, and have never played with it till now. Maybe someone who knows namespaces very well will chip in... I dunno about namespaces, but a safe route with alternates seems to be: Provide one main clone which is bare, pulls automatically, and is there to stay (no pruning), so that all others can use that as a reliable alternates source. Michael -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Local clones aka forks disk size optimization
-Original Message- From: Javier Domingo Sent: Wednesday, November 14, 2012 8:15 PM Hi Andrew, Doing this would require I got tracked which one comes from which. So it would imply some logic (and db) over it. With the hardlinking way, it wouldn't require anything. The idea is that you don't have to do anything else in the server. I understand that it would be imposible to do it for windows users Not true, it is a file system issue not an os issue. FAT does not support hard links, but ext2,3,4 and NTFS do. (but using cygwin), but for *nix ones yes... Javier Domingo smime.p7s Description: S/MIME cryptographic signature
Re: Local clones aka forks disk size optimization
Provide one main clone which is bare, pulls automatically, and is there to stay (no pruning), so that all others can use that as a reliable alternates source. The problem here, IMHO, is the assumption, that the main repo will never be cleaned up. But what to do if you dont wanna let it grow forever ? hmm, distributed GC is a tricky problem. maybe it could be easier having two kind of alternates: a) classical: gc+friends will drop local objects that are already there b) fallback: normal operations fetch objects if not accessible from anywhere else, but gc+friends do not skip objects from there. And extend prune machinery to put some backup of the dropped objects to some separate store. This way we could use some kind of rotating archive: * GC'ed objects will be stored in the backup repo for some while * there are multiple active (rotating) backups kept for some time, each cycle, only the oldest one is dropped (and maybe objects in a newer backup are removed from the older ones) * downstream repos must be synced often enough, so removed objects are fetched back from the backups early enough You could see this as some heap: * the currently active objects (directly referenced) are always on the top * once they're not referenced, they sink a lever deeper * when the're referenced again, they immediately jump up to the top * at some point in time unreferenced objects sink too deep that they're dropped completely cu -- Mit freundlichen Grüßen / Kind regards Enrico Weigelt VNC - Virtual Network Consult GmbH Head Of Development Pariser Platz 4a, D-10117 Berlin Tel.: +49 (30) 3464615-20 Fax: +49 (30) 3464615-59 enrico.weig...@vnc.biz; www.vnc.de -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Local clones aka forks disk size optimization
On 15 November 2012 10:42, Javier Domingo javier...@gmail.com wrote: Hi, I have come up with this while doing some local forks for work. Currently, when you clone a repo using a path (not file:/// protocol) you get all the common objects linked. But as you work, each one will continue growing on its way, although they may have common objects. Is there any way to avoid this? I mean, can something be done in git, that it checks for (when pulling) the same objects in the other forks? Have you seen alternates? From [1]: How to share objects between existing repositories? --- Do echo /source/git/project/.git/objects/ .git/objects/info/alternates and then follow it up with git repack -a -d -l where the '-l' means that it will only put local objects in the pack-file (strictly speaking, it will put any loose objects from the alternate tree too, so you'll have a fully packed archive, but it won't duplicate objects that are already packed in the alternate tree). [1] https://git.wiki.kernel.org/index.php/GitFaq#How_to_share_objects_between_existing_repositories.3F Regards, Andrew Ardill -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Local clones aka forks disk size optimization
Hi Andrew, The problem about that, is that if I want to delete the first repo, I will loose objects... Or does that repack also hard-link the objects in other repos? I don't want to accidentally loose data, so it would be nice that althought avoided to repack things, it would also hardlink them. Javier Domingo 2012/11/15 Andrew Ardill andrew.ard...@gmail.com: On 15 November 2012 10:42, Javier Domingo javier...@gmail.com wrote: Hi, I have come up with this while doing some local forks for work. Currently, when you clone a repo using a path (not file:/// protocol) you get all the common objects linked. But as you work, each one will continue growing on its way, although they may have common objects. Is there any way to avoid this? I mean, can something be done in git, that it checks for (when pulling) the same objects in the other forks? Have you seen alternates? From [1]: How to share objects between existing repositories? --- Do echo /source/git/project/.git/objects/ .git/objects/info/alternates and then follow it up with git repack -a -d -l where the '-l' means that it will only put local objects in the pack-file (strictly speaking, it will put any loose objects from the alternate tree too, so you'll have a fully packed archive, but it won't duplicate objects that are already packed in the alternate tree). [1] https://git.wiki.kernel.org/index.php/GitFaq#How_to_share_objects_between_existing_repositories.3F Regards, Andrew Ardill -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Local clones aka forks disk size optimization
On 15 November 2012 11:40, Javier Domingo javier...@gmail.com wrote: Hi Andrew, The problem about that, is that if I want to delete the first repo, I will loose objects... Or does that repack also hard-link the objects in other repos? I don't want to accidentally loose data, so it would be nice that althought avoided to repack things, it would also hardlink them. Hi Javier, check out the section below the one I linked earlier: How to stop sharing objects between repositories? To copy the shared objects into the local repository, repack without the -l flag git repack -a Then remove the pointer to the alternate object store rm .git/objects/info/alternates (If the repository is edited between the two steps, it could become corrupted when the alternates file is removed. If you're unsure, you can use git fsck to check for corruption. If things go wrong, you can always recover by replacing the alternates file and starting over). Regards, Andrew Ardill -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Local clones aka forks disk size optimization
Hi Andrew, Doing this would require I got tracked which one comes from which. So it would imply some logic (and db) over it. With the hardlinking way, it wouldn't require anything. The idea is that you don't have to do anything else in the server. I understand that it would be imposible to do it for windows users (but using cygwin), but for *nix ones yes... Javier Domingo 2012/11/15 Andrew Ardill andrew.ard...@gmail.com: On 15 November 2012 11:40, Javier Domingo javier...@gmail.com wrote: Hi Andrew, The problem about that, is that if I want to delete the first repo, I will loose objects... Or does that repack also hard-link the objects in other repos? I don't want to accidentally loose data, so it would be nice that althought avoided to repack things, it would also hardlink them. Hi Javier, check out the section below the one I linked earlier: How to stop sharing objects between repositories? To copy the shared objects into the local repository, repack without the -l flag git repack -a Then remove the pointer to the alternate object store rm .git/objects/info/alternates (If the repository is edited between the two steps, it could become corrupted when the alternates file is removed. If you're unsure, you can use git fsck to check for corruption. If things go wrong, you can always recover by replacing the alternates file and starting over). Regards, Andrew Ardill -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Local clones aka forks disk size optimization
On 15 November 2012 12:15, Javier Domingo javier...@gmail.com wrote: Hi Andrew, Doing this would require I got tracked which one comes from which. So it would imply some logic (and db) over it. With the hardlinking way, it wouldn't require anything. The idea is that you don't have to do anything else in the server. I understand that it would be imposible to do it for windows users (but using cygwin), but for *nix ones yes... Javier Domingo Paraphrasing from git-clone(1): When cloning a repository, if the source repository is specified with /path/to/repo syntax, the default is to clone the repository by making a copy of HEAD and everything under objects and refs directories. The files under .git/objects/ directory are hardlinked to save space when possible. To force copying instead of hardlinking (which may be desirable if you are trying to make a back-up of your repository) --no-hardlinks can be used. So hardlinks should be used where possible, and if they are not try upgrading Git. I think that covers all the use cases you have? Regards, Andrew Ardill -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Local clones aka forks disk size optimization
On Thu, Nov 15, 2012 at 7:04 AM, Andrew Ardill andrew.ard...@gmail.com wrote: On 15 November 2012 12:15, Javier Domingo javier...@gmail.com wrote: Hi Andrew, Doing this would require I got tracked which one comes from which. So it would imply some logic (and db) over it. With the hardlinking way, it wouldn't require anything. The idea is that you don't have to do anything else in the server. I understand that it would be imposible to do it for windows users (but using cygwin), but for *nix ones yes... Javier Domingo Paraphrasing from git-clone(1): When cloning a repository, if the source repository is specified with /path/to/repo syntax, the default is to clone the repository by making a copy of HEAD and everything under objects and refs directories. The files under .git/objects/ directory are hardlinked to save space when possible. To force copying instead of hardlinking (which may be desirable if you are trying to make a back-up of your repository) --no-hardlinks can be used. So hardlinks should be used where possible, and if they are not try upgrading Git. I think that covers all the use cases you have? I am not sure it does. My understanding is this: 'git clone -l' saves space on the initial clone, but subsequent pushes end up with the same objects duplicated across all the forks (assuming most of the forks keep up with some canonical repo). The alternates mechanism can give you ongoing savings (as long as you push to the main repo first), but it is dangerous, in the words of the git-clone manpage. You have to be confident no one will delete a ref from the main repo and then do a gc or let it auto-gc. He's looking for something that addresses both these issues. As an additional idea, I suspect this is what the namespaces feature was created for, but I am not sure, and have never played with it till now. Maybe someone who knows namespaces very well will chip in... -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html