Re: Stalled git cloning and possible solutions

2013-09-03 Thread V.Krishn
On Friday, August 30, 2013 03:48:44 AM you wrote:
 V.Krishn vkris...@gmail.com writes:
  On Friday, August 30, 2013 02:40:34 AM you wrote:
  V.Krishn wrote:
   Quite sometimes when cloning a large repo stalls, hitting Ctrl+c
   cleans what been downloaded, and process needs re-start.
   
   Is there a way to recover or continue from already downloaded files
   during cloning ?
  
  No, sadly.  The pack sent for a clone is generated dynamically, so
  there's no easy way to support the equivalent of an HTTP Range request
  to resume.  Someone might implement an appropriate protocol extension
  to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
  but for now it doesn't exist.
  
  This is what I tried but then realized something more is needed:
  
  During stalled clone avoid  Ctrl+c.
  1. Copy the content .i.e .git folder some other place.
  2. cd new dir
  3. git config fetch.unpackLimit 99
  4. git config transfer.unpackLimit 99
 
 These two steps will not help, as negotiation between the sender and
 the receiver is based on the commits that are known to be complete,
 and an earlier failed fetch will not (and should not) update refs
 on the receiver's side.
 
  What you *can* do today is create a bundle from the large repo
  somewhere with a reliable connection and then grab that using a
  resumable transport such as HTTP.
 
 Yes.
 
 Another possibility is, if the project being cloned has a tag (or a
 branch) that points at a commit back when it was smaller, do this
 
   git init x 
 cd x 
 git fetch $that_repository
 $that_tag:refs/tags/back_then_i_was_small
 
 to prime the object store of a temporary repository 'x' with a
 hopefully smaller transfer, and then use it as a --reference
 repository to the real clone.

What more files/info would be needed.
I noticed the tmp_pack_xx may not have object type commit/tree.
Do I need to manually create .git/refs..

I was wondering the following would further help in recovering.

A
1. If pack file was created in sequence to commit history(date), i.e 
blob+commit+treetags...+blob+commit+tree. 
also if in parallel idx was also created or atleast a tmp idx.
2. Update other files in .git dir before pack process.
(as stated in previous email).
3. Objects are named like datestamp(epoch)+sha1 
 and stored in epoch directory. (date fmt can be yymmdd).
 (this might break back-compat)
4. Add git fsck --defrag [1..4] 
   #this can take another parameter like level, 
 applying various heuristic optimization.

B
Another option would be:
git clone url --use-method=rsync
this would transfer files as is in .git dir (ones necessary).
And run `git gc` or any other housekeeping upon completion.
This method would allow resuming.
Cons:
  Any change in pack file on server during download becomes a potential issue.

The clone resume may not be a priority but if a minor changes can help in 
recovery, this would be nice. 

I still like the bundle method if git services made this easy.

-- 
Regards.
V.Krishn
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Stalled git cloning and possible solutions

2013-08-30 Thread Duy Nguyen
On Fri, Aug 30, 2013 at 4:10 AM, Jonathan Nieder jrnie...@gmail.com wrote:
 V.Krishn wrote:

 Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what
 been downloaded, and process needs re-start.

 Is there a way to recover or continue from already downloaded files during
 cloning ?

 No, sadly.  The pack sent for a clone is generated dynamically, so
 there's no easy way to support the equivalent of an HTTP Range request
 to resume.  Someone might implement an appropriate protocol extension
 to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
 but for now it doesn't exist.

OK how about a new capability resume to upload-pack. fetch-pack can
then send capability resume[=SHA-1,skip] to upload-pack. The
first time it sends resume without parameters, and upload-pack will
send back an SHA-1 to identify the pack being transferred together
with a full pack as usual. When early disconnection happens, it sends
the received SHA-1 and the received pack's size so far. It either
receives the remaining part, or a full pack.

When upload-pack gets resume, it calculates a checksum of all input
that may impact pack generation. If the checksum matches the SHA-1
from fetch-pack, it'll continue to generate the pack as usual, but
will skip sending the first skip bytes (maybe with a fake header so
that fetch-pack realizes this is a partial pack). If the checksum does
not match, it sends full pack again. I count on index-pack to spot
corrupt resumed pack due to bugs.

The input to calculate SHA-1 checksum includes:

 - the result SHA-1 list from rev-list
 - git version string
 - .git/shallow
 - replace object database
 - pack.* config
 - maybe some other variables (I haven't checked pack-objects)

Another Git implementation can generate this SHA-1 in a totally
different way and may even cache the generated pack.

If at resume time, the load balancer directs the request to another
upload-pack that generates this SHA-1 differently, ok this won't work
(i.e. full pack is returned). In a busy repository, some refs may have
moved so rev-list result at the resume time won't match any more, but
we can deal with that later by relaxing to allow want  lines with
SHA-1 that are reachable from current refs, not just one of the refs
(pack v4 or reachability bitmaps help).
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Stalled git cloning and possible solutions

2013-08-30 Thread Duy Nguyen
On Fri, Aug 30, 2013 at 7:17 PM, Duy Nguyen pclo...@gmail.com wrote:
 OK how about a new capability resume to upload-pack. fetch-pack can
 then send capability resume[=SHA-1,skip] to upload-pack. The
 first time it sends resume without parameters, and upload-pack will
 send back an SHA-1 to identify the pack being transferred together
 with a full pack as usual. When early disconnection happens, it sends
 the received SHA-1 and the received pack's size so far. It either
 receives the remaining part, or a full pack.

 When upload-pack gets resume, it calculates a checksum of all input
 that may impact pack generation. If the checksum matches the SHA-1
 from fetch-pack, it'll continue to generate the pack as usual, but
 will skip sending the first skip bytes (maybe with a fake header so
 that fetch-pack realizes this is a partial pack). If the checksum does
 not match, it sends full pack again. I count on index-pack to spot
 corrupt resumed pack due to bugs.

 The input to calculate SHA-1 checksum includes:

  - the result SHA-1 list from rev-list
  - git version string
  - .git/shallow
  - replace object database
  - pack.* config
  - maybe some other variables (I haven't checked pack-objects)

should have tested something first before I wrote. --threads adds some
randomness to pack generation so it has to be --threads=1. Not sure if
git repository hosts are happy with it..

 Another Git implementation can generate this SHA-1 in a totally
 different way and may even cache the generated pack.

 If at resume time, the load balancer directs the request to another
 upload-pack that generates this SHA-1 differently, ok this won't work
 (i.e. full pack is returned). In a busy repository, some refs may have
 moved so rev-list result at the resume time won't match any more, but
 we can deal with that later by relaxing to allow want  lines with
 SHA-1 that are reachable from current refs, not just one of the refs
 (pack v4 or reachability bitmaps help).
 --
 Duy



-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Stalled git cloning and possible solutions

2013-08-29 Thread V.Krishn
Hi,

Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what 
been downloaded, and process needs re-start.

Is there a way to recover or continue from already downloaded files during 
cloning ?
Please point me to an archive url if solution exists. (though I continue to 
search through them as I email this)

Can there be something like:
git clone url --use-method=rsync

-- 
Regards.
V.Krishn
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Stalled git cloning and possible solutions

2013-08-29 Thread Jonathan Nieder
V.Krishn wrote:

 Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what 
 been downloaded, and process needs re-start.

 Is there a way to recover or continue from already downloaded files during 
 cloning ?

No, sadly.  The pack sent for a clone is generated dynamically, so
there's no easy way to support the equivalent of an HTTP Range request
to resume.  Someone might implement an appropriate protocol extension
to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
but for now it doesn't exist.

What you *can* do today is create a bundle from the large repo
somewhere with a reliable connection and then grab that using a
resumable transport such as HTTP.  A kind person made a service to do
that.

  http://thread.gmane.org/gmane.comp.version-control.git/181380

Hope that helps,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Stalled git cloning and possible solutions

2013-08-29 Thread V.Krishn
On Friday, August 30, 2013 02:40:34 AM you wrote:
 V.Krishn wrote:
  Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans
  what been downloaded, and process needs re-start.
  
  Is there a way to recover or continue from already downloaded files
  during cloning ?
 
 No, sadly.  The pack sent for a clone is generated dynamically, so
 there's no easy way to support the equivalent of an HTTP Range request
 to resume.  Someone might implement an appropriate protocol extension
 to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
 but for now it doesn't exist.

This is what I tried but then realized something more is needed:

During stalled clone avoid  Ctrl+c. 
1. Copy the content .i.e .git folder some other place.
2. cd new dir
3. git config fetch.unpackLimit 99
4. git config transfer.unpackLimit 99
5. cat .git/config #to see if config went ok
 
6. recover process:
 git unpack-objects -r --strict .git/objects/pack/tmp_pack_0mSPsc

THEN... hopefully thought following should do the trick
 git pull
 OR
 git-fetch-pack
 OR
 git repack + git pull

but then something more is needed :)
like index/map file... etc for it work.

 
 What you *can* do today is create a bundle from the large repo
 somewhere with a reliable connection and then grab that using a
 resumable transport such as HTTP.  A kind person made a service to do
 that.
 
   http://thread.gmane.org/gmane.comp.version-control.git/181380

Service looks nice. Hope its gets sponsors to keep it running.

-- 
Regards.
V.Krishn
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Stalled git cloning and possible solutions

2013-08-29 Thread Junio C Hamano
V.Krishn vkris...@gmail.com writes:

 On Friday, August 30, 2013 02:40:34 AM you wrote:
 V.Krishn wrote:
  Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans
  what been downloaded, and process needs re-start.
  
  Is there a way to recover or continue from already downloaded files
  during cloning ?
 
 No, sadly.  The pack sent for a clone is generated dynamically, so
 there's no easy way to support the equivalent of an HTTP Range request
 to resume.  Someone might implement an appropriate protocol extension
 to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
 but for now it doesn't exist.

 This is what I tried but then realized something more is needed:

 During stalled clone avoid  Ctrl+c. 
 1. Copy the content .i.e .git folder some other place.
 2. cd new dir
 3. git config fetch.unpackLimit 99
 4. git config transfer.unpackLimit 99

These two steps will not help, as negotiation between the sender and
the receiver is based on the commits that are known to be complete,
and an earlier failed fetch will not (and should not) update refs
on the receiver's side.

 What you *can* do today is create a bundle from the large repo
 somewhere with a reliable connection and then grab that using a
 resumable transport such as HTTP.

Yes.

Another possibility is, if the project being cloned has a tag (or a
branch) that points at a commit back when it was smaller, do this

git init x 
cd x 
git fetch $that_repository $that_tag:refs/tags/back_then_i_was_small

to prime the object store of a temporary repository 'x' with a
hopefully smaller transfer, and then use it as a --reference
repository to the real clone.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Stalled git cloning and possible solutions

2013-08-29 Thread V.Krishn
On Friday, August 30, 2013 03:48:44 AM you wrote:
 V.Krishn vkris...@gmail.com writes:
  On Friday, August 30, 2013 02:40:34 AM you wrote:
  V.Krishn wrote:
   Quite sometimes when cloning a large repo stalls, hitting Ctrl+c
   cleans what been downloaded, and process needs re-start.
   
   Is there a way to recover or continue from already downloaded files
   during cloning ?
  
  No, sadly.  The pack sent for a clone is generated dynamically, so
  there's no easy way to support the equivalent of an HTTP Range request
  to resume.  Someone might implement an appropriate protocol extension
  to tackle this (e.g., peff's seed-with-clone.bundle hack) some day,
  but for now it doesn't exist.
  
  This is what I tried but then realized something more is needed:
  
  During stalled clone avoid  Ctrl+c.
  1. Copy the content .i.e .git folder some other place.
  2. cd new dir
  3. git config fetch.unpackLimit 99
  4. git config transfer.unpackLimit 99
 
 These two steps will not help, as negotiation between the sender and
 the receiver is based on the commits that are known to be complete,
 and an earlier failed fetch will not (and should not) update refs
 on the receiver's side.
 
  What you *can* do today is create a bundle from the large repo
  somewhere with a reliable connection and then grab that using a
  resumable transport such as HTTP.
 
 Yes.
 
 Another possibility is, if the project being cloned has a tag (or a
 branch) that points at a commit back when it was smaller, do this
 
   git init x 
 cd x 
 git fetch $that_repository
 $that_tag:refs/tags/back_then_i_was_small
 
 to prime the object store of a temporary repository 'x' with a
 hopefully smaller transfer, and then use it as a --reference
 repository to the real clone.

Would be nice if,
1. the clone process downloaded all files in .git before the blobs or packing 
process and added a lock file like .clone and then started the packing 
process.
2. Any interrupt((ctrl+c) should not delete the already dowloaded files but on 
re-clone process it should check .clone file and resume cloning.
3. Upon finishing cloning delete .clone file.

-- 
Regards.
V.Krishn
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html