Re: [git-users] Size of cloned git subtrees - only history / files for subtree needed

2012-08-30 Thread Haasip Satang
Hey, 

good point, could have mentioned that earlier. We are using ssh (not dumb 
http). Even a local clone created without specifying the file: protocol and 
with --no-hardlinks results in a big repo copy locally: 

git clone --depth 1 --no-hardlinks main -b subtrees/xyz xyz

If I clone with 

git clone --depth 1 --no-hardlinks* file://*path/to/main -b subtrees/xyz 
xyz 

I end up with a small local clone though (actually --no-hardlinks doesn't 
matter in this case of course... the import thing is file:)

Remotely I always get a big one, no matter what I do... (also tried the 
native git protocol already, but by default we are planning to use ssh).

Any other ideas?

Am Donnerstag, 30. August 2012 14:01:27 UTC+2 schrieb Antony Male:
>
> On Thursday, 30 August 2012 11:34:52 UTC+1, Haasip Satang wrote:
>
>>
>> So the question actually is why does
>>
>> git clone --depth 1 --no-hardlinks *file:///*home/me/gitTests/subtreeRepo 
>> -b subtrees/xyz *xyz *
>>
>> give me a small clone (*but only locally), *while cloning from remote I 
>> get a big one.
>>
>
> What transport are you cloning over?
>
> When cloning over a "smart" transport (smart http(s), ssh, git://) a git 
> process on the local machine communicates with one on the remote machine, 
> and between them they negotiate which objects need to be transferred. The 
> remote process then compresses these objects into a custom packfile, and 
> this is transferred.
>
> When cloning over a "dumb" protocol (dumb http(s), ftp), there's no way of 
> spawning a git process on the remote machine. Therefore the local process 
> just has to download whatever packfiles are available. If there are no 
> packfiles corresponding to the objects required for a shallow clone, git 
> may (in the worst case) end up downloading the entirety of your history, 
> even if it doesn't need to. The same goes for fetches: if the only way to 
> get the required new objects is to download everything, this is what git 
> has to do.
>
> I suspect you're cloning over a dumb transport, and this is what's causing 
> the effects you're seeing. Smart http(s) has been supported by git for a 
> long time, and, although trickier to set up on the remote side, is 
> definitely worth it.
>
> Hope that helps,
> Antony 
>

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/git-users/-/yIC23MFPpNkJ.
To post to this group, send email to git-users@googlegroups.com.
To unsubscribe from this group, send email to 
git-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/git-users?hl=en.



Re: [git-users] Size of cloned git subtrees - only history / files for subtree needed

2012-08-30 Thread Antony Male
On Thursday, 30 August 2012 11:34:52 UTC+1, Haasip Satang wrote:

>
> So the question actually is why does
>
> git clone --depth 1 --no-hardlinks *file:///*home/me/gitTests/subtreeRepo 
> -b subtrees/xyz *xyz *
>
> give me a small clone (*but only locally), *while cloning from remote I 
> get a big one.
>

What transport are you cloning over?

When cloning over a "smart" transport (smart http(s), ssh, git://) a git 
process on the local machine communicates with one on the remote machine, 
and between them they negotiate which objects need to be transferred. The 
remote process then compresses these objects into a custom packfile, and 
this is transferred.

When cloning over a "dumb" protocol (dumb http(s), ftp), there's no way of 
spawning a git process on the remote machine. Therefore the local process 
just has to download whatever packfiles are available. If there are no 
packfiles corresponding to the objects required for a shallow clone, git 
may (in the worst case) end up downloading the entirety of your history, 
even if it doesn't need to. The same goes for fetches: if the only way to 
get the required new objects is to download everything, this is what git 
has to do.

I suspect you're cloning over a dumb transport, and this is what's causing 
the effects you're seeing. Smart http(s) has been supported by git for a 
long time, and, although trickier to set up on the remote side, is 
definitely worth it.

Hope that helps,
Antony 

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/git-users/-/0PlD-lKA-pIJ.
To post to this group, send email to git-users@googlegroups.com.
To unsubscribe from this group, send email to 
git-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/git-users?hl=en.



Re: [git-users] Size of cloned git subtrees - only history / files for subtree needed

2012-08-30 Thread Haasip Satang
;-) That's what I did as you can see in my explanation above ;-) The 
problem still seems to be that it only works locally on the same linux 
machine. When I try to clone from any remote machine (not matter which OS) 
I end up getting the huge .git folder. 

So the question actually is why does

git clone --depth 1 --no-hardlinks *file:///*home/me/gitTests/subtreeRepo 
-b subtrees/xyz *xyz *

give me a small clone (*but only locally), *while cloning from remote I get 
a big one. 

And as mentioned earlier as well, when cloning the small *xyz *from remote 
then I end up with what I wanna have; a small xyz project on a remote 
machine. 

Why can I not directly clone xyz remotely and get the same result as with 
the local clone?

Am Donnerstag, 30. August 2012 09:00:04 UTC+2 schrieb Philip Oakley:
>
>  Isn't a shallow clone a good use case for this? You only need the latest 
> commit of each project you want to build and then it either works or it 
> doesn't, and the clone is then deleted. 
>  
> So is 'git clone --depth ' what you need? 
> Use   := 1
>   
> Just a thought
>  
> Philip
>  
> -
>  Original Message - 
>
> *From:* Haasip Satang  
> *To:* git-...@googlegroups.com  
> *Sent:* Thursday, August 30, 2012 1:21 AM
> *Subject:* [git-users] Size of cloned git subtrees - only history / files 
> for subtree needed
>
> Hi all,  
>
> in short the question of the lenghty explanation below will be: How can I 
> create a clone of a subtree that only contains the data needed for that 
> subtree in the .git folder.  
>
> In detail here is what I have tried already and what my setup looks like: 
> We are having a big repository containing multiple projects (political 
> reasons, cannot avoid having that... at least for now). While this works 
> fine for all the developers (they just clone the big repo and get all the 
> projects they need), we are facing problems with our continuous build 
> system (Jenkins). 
>
> Here we would like to have a job for each single project; of course 
> WITHOUT having to clone the whole big repo for every job, as this would 
> lead to a significant overhead on disk. 
>
> After searching around for some time I basically came across four 
> potential solutions: 
>
> 1. Sparse Checkout
> 2. Submodules
> 3. Individual Repos with a manager script like repo, mr, git-status, and 
> all the others that exist to tackle that problem
> 4. Subtrees
>
> The problem with 1 is, you still get to clone the whole repo (including 
> all history), only to then checkout a part of it --> still disk overhead. 
> As for submodules, I personally don't really like them and don't think the 
> should be used in this case and they are kinda difficult to handle and can 
> be fragile anyway. 
> The additional script based solution seems kinda hacky as well, so I 
> didn't really follow up on that too much. 
>
> So my favorite solution so far is actually using git subtree, which is 
> more or less easy (especially since the subtree branches are only used for 
> the CI builds / in a read only way, nothing needs to be pushed back to the 
> bigrepo). 
>
> The problem is, however, when I clone the bare and then create the subtree 
> branches in the cloned working copy and then try to clone these subtree 
> branches only, I still seem to get the whole big history, including all the 
> stuff outside the tree. 
>
> Is there any way to avoid that and create a synthetic project history 
> containing only data relevant for the subtree? 
>
> What I did to kinda get there is more a hacky way. I create the subtree 
> branch using: 
>
>  git subtree split --prefix=xyz --annotate="[xy] " --rejoin -b 
> subtrees/xyz
>
> Then I clone that with: 
>
> git clone --depth 1 --no-hardlinks file:///home/me/gitTests/subtreeRepo -b 
> subtrees/xyz xyz
>
> So creating a shallow clone (depth 1) seems to be the only way and that 
> also only works on the local linux machine. If I clone the same subtreeRepo 
> branch on a remote machine I actually get the whole big pack / history with 
> it (which I of course don't want). 
>
> So what I did is I cloned the subtree branch locally and then cloned that 
> repo from my remote Jenkins machine. While this seems to work (I haven't 
> looked in if I'm getting the necessary change sets to send out the emails 
> yet) it seems both, unnecessary complicated and very hacky. 
>
> To sum up, let me conclude with the question from the beginning: How can I 
> create a clone of a subtree that only contains the data needed for that 
> subtree in the .git folder. 
>
> Looking forward to your comments and ideas :)
>
> Thanks, Haasip
>
>
>
>
>

Re: [git-users] Size of cloned git subtrees - only history / files for subtree needed

2012-08-30 Thread Philip Oakley
Isn't a shallow clone a good use case for this? You only need the latest commit 
of each project you want to build and then it either works or it doesn't, and 
the clone is then deleted. 

So is 'git clone --depth ' what you need? 
Use   := 1

Just a thought

Philip

- Original Message - 
  From: Haasip Satang 
  To: git-users@googlegroups.com 
  Sent: Thursday, August 30, 2012 1:21 AM
  Subject: [git-users] Size of cloned git subtrees - only history / files for 
subtree needed


  Hi all, 


  in short the question of the lenghty explanation below will be: How can I 
create a clone of a subtree that only contains the data needed for that subtree 
in the .git folder.  


  In detail here is what I have tried already and what my setup looks like: 
  We are having a big repository containing multiple projects (political 
reasons, cannot avoid having that... at least for now). While this works fine 
for all the developers (they just clone the big repo and get all the projects 
they need), we are facing problems with our continuous build system (Jenkins). 


  Here we would like to have a job for each single project; of course WITHOUT 
having to clone the whole big repo for every job, as this would lead to a 
significant overhead on disk. 


  After searching around for some time I basically came across four potential 
solutions: 


  1. Sparse Checkout
  2. Submodules
  3. Individual Repos with a manager script like repo, mr, git-status, and all 
the others that exist to tackle that problem
  4. Subtrees


  The problem with 1 is, you still get to clone the whole repo (including all 
history), only to then checkout a part of it --> still disk overhead. 
  As for submodules, I personally don't really like them and don't think the 
should be used in this case and they are kinda difficult to handle and can be 
fragile anyway. 
  The additional script based solution seems kinda hacky as well, so I didn't 
really follow up on that too much. 


  So my favorite solution so far is actually using git subtree, which is more 
or less easy (especially since the subtree branches are only used for the CI 
builds / in a read only way, nothing needs to be pushed back to the bigrepo). 


  The problem is, however, when I clone the bare and then create the subtree 
branches in the cloned working copy and then try to clone these subtree 
branches only, I still seem to get the whole big history, including all the 
stuff outside the tree. 


  Is there any way to avoid that and create a synthetic project history 
containing only data relevant for the subtree? 


  What I did to kinda get there is more a hacky way. I create the subtree 
branch using: 


  git subtree split --prefix=xyz --annotate="[xy] " --rejoin -b subtrees/xyz

  Then I clone that with: 

  git clone --depth 1 --no-hardlinks file:///home/me/gitTests/subtreeRepo -b 
subtrees/xyz xyz


  So creating a shallow clone (depth 1) seems to be the only way and that also 
only works on the local linux machine. If I clone the same subtreeRepo branch 
on a remote machine I actually get the whole big pack / history with it (which 
I of course don't want). 

  So what I did is I cloned the subtree branch locally and then cloned that 
repo from my remote Jenkins machine. While this seems to work (I haven't looked 
in if I'm getting the necessary change sets to send out the emails yet) it 
seems both, unnecessary complicated and very hacky. 

  To sum up, let me conclude with the question from the beginning: How can I 
create a clone of a subtree that only contains the data needed for that subtree 
in the .git folder. 

  Looking forward to your comments and ideas :)

  Thanks, Haasip















  -- 
  You received this message because you are subscribed to the Google Groups 
"Git for human beings" group.
  To view this discussion on the web visit 
https://groups.google.com/d/msg/git-users/-/n5ZPYpDf4EIJ.
  To post to this group, send email to git-users@googlegroups.com.
  To unsubscribe from this group, send email to 
git-users+unsubscr...@googlegroups.com.
  For more options, visit this group at 
http://groups.google.com/group/git-users?hl=en.

  No virus found in this message.
  Checked by AVG - www.avg.com
  Version: 2012.0.2197 / Virus Database: 2437/5233 - Release Date: 08/29/12

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To post to this group, send email to git-users@googlegroups.com.
To unsubscribe from this group, send email to 
git-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/git-users?hl=en.



[git-users] Size of cloned git subtrees - only history / files for subtree needed

2012-08-29 Thread Haasip Satang
Hi all, 

in short the question of the lenghty explanation below will be: How can I 
create a clone of a subtree that only contains the data needed for that 
subtree in the .git folder.  

In detail here is what I have tried already and what my setup looks like: 
We are having a big repository containing multiple projects (political 
reasons, cannot avoid having that... at least for now). While this works 
fine for all the developers (they just clone the big repo and get all the 
projects they need), we are facing problems with our continuous build 
system (Jenkins). 

Here we would like to have a job for each single project; of course WITHOUT 
having to clone the whole big repo for every job, as this would lead to a 
significant overhead on disk. 

After searching around for some time I basically came across four potential 
solutions: 

1. Sparse Checkout
2. Submodules
3. Individual Repos with a manager script like repo, mr, git-status, and 
all the others that exist to tackle that problem
4. Subtrees

The problem with 1 is, you still get to clone the whole repo (including all 
history), only to then checkout a part of it --> still disk overhead. 
As for submodules, I personally don't really like them and don't think the 
should be used in this case and they are kinda difficult to handle and can 
be fragile anyway. 
The additional script based solution seems kinda hacky as well, so I didn't 
really follow up on that too much. 

So my favorite solution so far is actually using git subtree, which is more 
or less easy (especially since the subtree branches are only used for the 
CI builds / in a read only way, nothing needs to be pushed back to the 
bigrepo). 

The problem is, however, when I clone the bare and then create the subtree 
branches in the cloned working copy and then try to clone these subtree 
branches only, I still seem to get the whole big history, including all the 
stuff outside the tree. 

Is there any way to avoid that and create a synthetic project history 
containing only data relevant for the subtree? 

What I did to kinda get there is more a hacky way. I create the subtree 
branch using: 

git subtree split --prefix=xyz --annotate="[xy] " --rejoin -b subtrees/xyz

Then I clone that with: 

git clone --depth 1 --no-hardlinks file:///home/me/gitTests/subtreeRepo -b 
subtrees/xyz xyz

So creating a shallow clone (depth 1) seems to be the only way and that 
also only works on the local linux machine. If I clone the same subtreeRepo 
branch on a remote machine I actually get the whole big pack / history with 
it (which I of course don't want). 

So what I did is I cloned the subtree branch locally and then cloned that 
repo from my remote Jenkins machine. While this seems to work (I haven't 
looked in if I'm getting the necessary change sets to send out the emails 
yet) it seems both, unnecessary complicated and very hacky. 

To sum up, let me conclude with the question from the beginning: How can I 
create a clone of a subtree that only contains the data needed for that 
subtree in the .git folder. 

Looking forward to your comments and ideas :)

Thanks, Haasip






-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/git-users/-/n5ZPYpDf4EIJ.
To post to this group, send email to git-users@googlegroups.com.
To unsubscribe from this group, send email to 
git-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/git-users?hl=en.