Re: Local clones aka forks disk size optimization

2012-11-18 Thread Sitaram Chamarty
On Fri, Nov 16, 2012 at 11:34 PM, Enrico Weigelt enrico.weig...@vnc.biz wrote:

 Provide one main clone which is bare, pulls automatically, and is
 there to stay (no pruning), so that all others can use that as a
 reliable alternates source.

 The problem here, IMHO, is the assumption, that the main repo will
 never be cleaned up. But what to do if you dont wanna let it grow
 forever ?

That's not the only problem.  I believe you only get the savings when
the main repo gets the commits first.  Which is probably ok most of
the time but it's worth mentioning.


 hmm, distributed GC is a tricky problem.

Except for one little issue (see other thread, subject line cloning a
namespace downloads all the objects), namespaces appear to do
everything we want in terms of the typical use cases for alternates,
and/or 'git clone -l', at least on the server side.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Local clones aka forks disk size optimization

2012-11-18 Thread Enrico Weigelt
Hi,

 That's not the only problem.  I believe you only get the savings when
 the main repo gets the commits first.  Which is probably ok most of
 the time but it's worth mentioning.

Well, the saving will just be deferred to the point where the commit
finally went to the main repo and downstreams are gc'ed.

  hmm, distributed GC is a tricky problem.
 
 Except for one little issue (see other thread, subject line cloning
 a
 namespace downloads all the objects), namespaces appear to do
 everything we want in terms of the typical use cases for alternates,
 and/or 'git clone -l', at least on the server side.

hmm, not sure about the actual internals, but that namespace filtering
should work in a way that local clone should never see (or consider)
remote refs that are outside of the requested namespace. Perhaps that
should be handled entirely on server side, so all called commands treat
these refs as nonexisting.

By the way: what happens if one tries to clone from an broken repo
(which has several refs pointing to nonexisting objects ?


cu
-- 
Mit freundlichen Grüßen / Kind regards 

Enrico Weigelt 
VNC - Virtual Network Consult GmbH 
Head Of Development 

Pariser Platz 4a, D-10117 Berlin
Tel.: +49 (30) 3464615-20
Fax: +49 (30) 3464615-59

enrico.weig...@vnc.biz; www.vnc.de 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Local clones aka forks disk size optimization

2012-11-18 Thread Jörg Rosenkranz
2012/11/15 Javier Domingo javier...@gmail.com

 Is there any way to avoid this? I mean, can something be done in git,
 that it checks for (when pulling) the same objects in the other forks?


I've been using git-new-workdir
(https://github.com/git/git/blob/master/contrib/workdir/git-new-workdir)
for a similar problem. Maybe that's what you're searching?

Joerg.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Local clones aka forks disk size optimization

2012-11-16 Thread Michael J Gruber
Sitaram Chamarty venit, vidit, dixit 15.11.2012 04:44:
 On Thu, Nov 15, 2012 at 7:04 AM, Andrew Ardill andrew.ard...@gmail.com 
 wrote:
 On 15 November 2012 12:15, Javier Domingo javier...@gmail.com wrote:
 Hi Andrew,

 Doing this would require I got tracked which one comes from which. So
 it would imply some logic (and db) over it. With the hardlinking way,
 it wouldn't require anything. The idea is that you don't have to do
 anything else in the server.

 I understand that it would be imposible to do it for windows users
 (but using cygwin), but for *nix ones yes...
 Javier Domingo

 Paraphrasing from git-clone(1):

 When cloning a repository, if the source repository is specified with
 /path/to/repo syntax, the default is to clone the repository by making
 a copy of HEAD and everything under objects and refs directories. The
 files under .git/objects/ directory are hardlinked to save space when
 possible. To force copying instead of hardlinking (which may be
 desirable if you are trying to make a back-up of your repository)
 --no-hardlinks can be used.

 So hardlinks should be used where possible, and if they are not try
 upgrading Git.

 I think that covers all the use cases you have?
 
 I am not sure it does.  My understanding is this:
 
 'git clone -l' saves space on the initial clone, but subsequent pushes
 end up with the same objects duplicated across all the forks
 (assuming most of the forks keep up with some canonical repo).
 
 The alternates mechanism can give you ongoing savings (as long as you
 push to the main repo first), but it is dangerous, in the words of
 the git-clone manpage.  You have to be confident no one will delete a
 ref from the main repo and then do a gc or let it auto-gc.
 
 He's looking for something that addresses both these issues.
 
 As an additional idea, I suspect this is what the namespaces feature
 was created for, but I am not sure, and have never played with it till
 now.
 
 Maybe someone who knows namespaces very well will chip in...
 

I dunno about namespaces, but a safe route with alternates seems to be:

Provide one main clone which is bare, pulls automatically, and is
there to stay (no pruning), so that all others can use that as a
reliable alternates source.

Michael
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Local clones aka forks disk size optimization

2012-11-16 Thread Pyeron, Jason J CTR (US)
 -Original Message-
 From: Javier Domingo
 Sent: Wednesday, November 14, 2012 8:15 PM
 
 Hi Andrew,
 
 Doing this would require I got tracked which one comes from which. So
 it would imply some logic (and db) over it. With the hardlinking way,
 it wouldn't require anything. The idea is that you don't have to do
 anything else in the server.
 
 I understand that it would be imposible to do it for windows users

Not true, it is a file system issue not an os issue. FAT does not support hard 
links, but ext2,3,4 and NTFS do.

 (but using cygwin), but for *nix ones yes...
 Javier Domingo



smime.p7s
Description: S/MIME cryptographic signature


Re: Local clones aka forks disk size optimization

2012-11-16 Thread Enrico Weigelt

 Provide one main clone which is bare, pulls automatically, and is
 there to stay (no pruning), so that all others can use that as a
 reliable alternates source.

The problem here, IMHO, is the assumption, that the main repo will
never be cleaned up. But what to do if you dont wanna let it grow
forever ?

hmm, distributed GC is a tricky problem.

maybe it could be easier having two kind of alternates:

a) classical: gc+friends will drop local objects that are 
   already there
b) fallback: normal operations fetch objects if not accessible
   from anywhere else, but gc+friends do not skip objects from there.

And extend prune machinery to put some backup of the dropped objects
to some separate store.

This way we could use some kind of rotating archive:

* GC'ed objects will be stored in the backup repo for some while
* there are multiple active (rotating) backups kept for some time,
  each cycle, only the oldest one is dropped (and maybe objects
  in a newer backup are removed from the older ones)
* downstream repos must be synced often enough, so removed objects
  are fetched back from the backups early enough

You could see this as some heap:

* the currently active objects (directly referenced) are always
  on the top
* once they're not referenced, they sink a lever deeper
* when the're referenced again, they immediately jump up to the top
* at some point in time unreferenced objects sink too deep that
  they're dropped completely



cu
-- 
Mit freundlichen Grüßen / Kind regards 

Enrico Weigelt 
VNC - Virtual Network Consult GmbH 
Head Of Development 

Pariser Platz 4a, D-10117 Berlin
Tel.: +49 (30) 3464615-20
Fax: +49 (30) 3464615-59

enrico.weig...@vnc.biz; www.vnc.de 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Local clones aka forks disk size optimization

2012-11-14 Thread Andrew Ardill
On 15 November 2012 10:42, Javier Domingo javier...@gmail.com wrote:
 Hi,

 I have come up with this while doing some local forks for work.
 Currently, when you clone a repo using a path (not file:/// protocol)
 you get all the common objects linked.

 But as you work, each one will continue growing on its way, although
 they may have common objects.

 Is there any way to avoid this? I mean, can something be done in git,
 that it checks for (when pulling) the same objects in the other forks?

Have you seen alternates? From [1]:

 How to share objects between existing repositories?
 ---

 Do

 echo /source/git/project/.git/objects/  .git/objects/info/alternates

 and then follow it up with

 git repack -a -d -l

 where the '-l' means that it will only put local objects in the pack-file
 (strictly speaking, it will put any loose objects from the alternate tree
 too, so you'll have a fully packed archive, but it won't duplicate objects
 that are already packed in the alternate tree).

[1] 
https://git.wiki.kernel.org/index.php/GitFaq#How_to_share_objects_between_existing_repositories.3F


Regards,

Andrew Ardill
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Local clones aka forks disk size optimization

2012-11-14 Thread Javier Domingo
Hi Andrew,

The problem about that, is that if I want to delete the first repo, I
will loose objects... Or does that repack also hard-link the objects
in other repos? I don't want to accidentally loose data, so it would
be nice that althought avoided to repack things, it would also
hardlink them.
Javier Domingo


2012/11/15 Andrew Ardill andrew.ard...@gmail.com:
 On 15 November 2012 10:42, Javier Domingo javier...@gmail.com wrote:
 Hi,

 I have come up with this while doing some local forks for work.
 Currently, when you clone a repo using a path (not file:/// protocol)
 you get all the common objects linked.

 But as you work, each one will continue growing on its way, although
 they may have common objects.

 Is there any way to avoid this? I mean, can something be done in git,
 that it checks for (when pulling) the same objects in the other forks?

 Have you seen alternates? From [1]:

 How to share objects between existing repositories?
 ---

 Do

 echo /source/git/project/.git/objects/  .git/objects/info/alternates

 and then follow it up with

 git repack -a -d -l

 where the '-l' means that it will only put local objects in the pack-file
 (strictly speaking, it will put any loose objects from the alternate tree
 too, so you'll have a fully packed archive, but it won't duplicate objects
 that are already packed in the alternate tree).

 [1] 
 https://git.wiki.kernel.org/index.php/GitFaq#How_to_share_objects_between_existing_repositories.3F


 Regards,

 Andrew Ardill
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Local clones aka forks disk size optimization

2012-11-14 Thread Andrew Ardill
On 15 November 2012 11:40, Javier Domingo javier...@gmail.com wrote:
 Hi Andrew,

 The problem about that, is that if I want to delete the first repo, I
 will loose objects... Or does that repack also hard-link the objects
 in other repos? I don't want to accidentally loose data, so it would
 be nice that althought avoided to repack things, it would also
 hardlink them.

Hi Javier, check out the section below the one I linked earlier:

 How to stop sharing objects between repositories?

 To copy the shared objects into the local repository, repack without the -l 
 flag

 git repack -a

 Then remove the pointer to the alternate object store

 rm .git/objects/info/alternates

 (If the repository is edited between the two steps, it could become corrupted
 when the alternates file is removed. If you're unsure, you can use git fsck to
 check for corruption. If things go wrong, you can always recover by replacing
 the alternates file and starting over).

Regards,

Andrew Ardill
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Local clones aka forks disk size optimization

2012-11-14 Thread Javier Domingo
Hi Andrew,

Doing this would require I got tracked which one comes from which. So
it would imply some logic (and db) over it. With the hardlinking way,
it wouldn't require anything. The idea is that you don't have to do
anything else in the server.

I understand that it would be imposible to do it for windows users
(but using cygwin), but for *nix ones yes...
Javier Domingo


2012/11/15 Andrew Ardill andrew.ard...@gmail.com:
 On 15 November 2012 11:40, Javier Domingo javier...@gmail.com wrote:
 Hi Andrew,

 The problem about that, is that if I want to delete the first repo, I
 will loose objects... Or does that repack also hard-link the objects
 in other repos? I don't want to accidentally loose data, so it would
 be nice that althought avoided to repack things, it would also
 hardlink them.

 Hi Javier, check out the section below the one I linked earlier:

 How to stop sharing objects between repositories?

 To copy the shared objects into the local repository, repack without the -l 
 flag

 git repack -a

 Then remove the pointer to the alternate object store

 rm .git/objects/info/alternates

 (If the repository is edited between the two steps, it could become corrupted
 when the alternates file is removed. If you're unsure, you can use git fsck 
 to
 check for corruption. If things go wrong, you can always recover by replacing
 the alternates file and starting over).

 Regards,

 Andrew Ardill
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Local clones aka forks disk size optimization

2012-11-14 Thread Andrew Ardill
On 15 November 2012 12:15, Javier Domingo javier...@gmail.com wrote:
 Hi Andrew,

 Doing this would require I got tracked which one comes from which. So
 it would imply some logic (and db) over it. With the hardlinking way,
 it wouldn't require anything. The idea is that you don't have to do
 anything else in the server.

 I understand that it would be imposible to do it for windows users
 (but using cygwin), but for *nix ones yes...
 Javier Domingo

Paraphrasing from git-clone(1):

When cloning a repository, if the source repository is specified with
/path/to/repo syntax, the default is to clone the repository by making
a copy of HEAD and everything under objects and refs directories. The
files under .git/objects/ directory are hardlinked to save space when
possible. To force copying instead of hardlinking (which may be
desirable if you are trying to make a back-up of your repository)
--no-hardlinks can be used.

So hardlinks should be used where possible, and if they are not try
upgrading Git.

I think that covers all the use cases you have?

Regards,

Andrew Ardill
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Local clones aka forks disk size optimization

2012-11-14 Thread Sitaram Chamarty
On Thu, Nov 15, 2012 at 7:04 AM, Andrew Ardill andrew.ard...@gmail.com wrote:
 On 15 November 2012 12:15, Javier Domingo javier...@gmail.com wrote:
 Hi Andrew,

 Doing this would require I got tracked which one comes from which. So
 it would imply some logic (and db) over it. With the hardlinking way,
 it wouldn't require anything. The idea is that you don't have to do
 anything else in the server.

 I understand that it would be imposible to do it for windows users
 (but using cygwin), but for *nix ones yes...
 Javier Domingo

 Paraphrasing from git-clone(1):

 When cloning a repository, if the source repository is specified with
 /path/to/repo syntax, the default is to clone the repository by making
 a copy of HEAD and everything under objects and refs directories. The
 files under .git/objects/ directory are hardlinked to save space when
 possible. To force copying instead of hardlinking (which may be
 desirable if you are trying to make a back-up of your repository)
 --no-hardlinks can be used.

 So hardlinks should be used where possible, and if they are not try
 upgrading Git.

 I think that covers all the use cases you have?

I am not sure it does.  My understanding is this:

'git clone -l' saves space on the initial clone, but subsequent pushes
end up with the same objects duplicated across all the forks
(assuming most of the forks keep up with some canonical repo).

The alternates mechanism can give you ongoing savings (as long as you
push to the main repo first), but it is dangerous, in the words of
the git-clone manpage.  You have to be confident no one will delete a
ref from the main repo and then do a gc or let it auto-gc.

He's looking for something that addresses both these issues.

As an additional idea, I suspect this is what the namespaces feature
was created for, but I am not sure, and have never played with it till
now.

Maybe someone who knows namespaces very well will chip in...
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html