Re: Stalled git cloning and possible solutions
On Friday, August 30, 2013 03:48:44 AM you wrote: V.Krishn vkris...@gmail.com writes: On Friday, August 30, 2013 02:40:34 AM you wrote: V.Krishn wrote: Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what been downloaded, and process needs re-start. Is there a way to recover or continue from already downloaded files during cloning ? No, sadly. The pack sent for a clone is generated dynamically, so there's no easy way to support the equivalent of an HTTP Range request to resume. Someone might implement an appropriate protocol extension to tackle this (e.g., peff's seed-with-clone.bundle hack) some day, but for now it doesn't exist. This is what I tried but then realized something more is needed: During stalled clone avoid Ctrl+c. 1. Copy the content .i.e .git folder some other place. 2. cd new dir 3. git config fetch.unpackLimit 99 4. git config transfer.unpackLimit 99 These two steps will not help, as negotiation between the sender and the receiver is based on the commits that are known to be complete, and an earlier failed fetch will not (and should not) update refs on the receiver's side. What you *can* do today is create a bundle from the large repo somewhere with a reliable connection and then grab that using a resumable transport such as HTTP. Yes. Another possibility is, if the project being cloned has a tag (or a branch) that points at a commit back when it was smaller, do this git init x cd x git fetch $that_repository $that_tag:refs/tags/back_then_i_was_small to prime the object store of a temporary repository 'x' with a hopefully smaller transfer, and then use it as a --reference repository to the real clone. What more files/info would be needed. I noticed the tmp_pack_xx may not have object type commit/tree. Do I need to manually create .git/refs.. I was wondering the following would further help in recovering. A 1. If pack file was created in sequence to commit history(date), i.e blob+commit+treetags...+blob+commit+tree. also if in parallel idx was also created or atleast a tmp idx. 2. Update other files in .git dir before pack process. (as stated in previous email). 3. Objects are named like datestamp(epoch)+sha1 and stored in epoch directory. (date fmt can be yymmdd). (this might break back-compat) 4. Add git fsck --defrag [1..4] #this can take another parameter like level, applying various heuristic optimization. B Another option would be: git clone url --use-method=rsync this would transfer files as is in .git dir (ones necessary). And run `git gc` or any other housekeeping upon completion. This method would allow resuming. Cons: Any change in pack file on server during download becomes a potential issue. The clone resume may not be a priority but if a minor changes can help in recovery, this would be nice. I still like the bundle method if git services made this easy. -- Regards. V.Krishn -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stalled git cloning and possible solutions
On Fri, Aug 30, 2013 at 4:10 AM, Jonathan Nieder jrnie...@gmail.com wrote: V.Krishn wrote: Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what been downloaded, and process needs re-start. Is there a way to recover or continue from already downloaded files during cloning ? No, sadly. The pack sent for a clone is generated dynamically, so there's no easy way to support the equivalent of an HTTP Range request to resume. Someone might implement an appropriate protocol extension to tackle this (e.g., peff's seed-with-clone.bundle hack) some day, but for now it doesn't exist. OK how about a new capability resume to upload-pack. fetch-pack can then send capability resume[=SHA-1,skip] to upload-pack. The first time it sends resume without parameters, and upload-pack will send back an SHA-1 to identify the pack being transferred together with a full pack as usual. When early disconnection happens, it sends the received SHA-1 and the received pack's size so far. It either receives the remaining part, or a full pack. When upload-pack gets resume, it calculates a checksum of all input that may impact pack generation. If the checksum matches the SHA-1 from fetch-pack, it'll continue to generate the pack as usual, but will skip sending the first skip bytes (maybe with a fake header so that fetch-pack realizes this is a partial pack). If the checksum does not match, it sends full pack again. I count on index-pack to spot corrupt resumed pack due to bugs. The input to calculate SHA-1 checksum includes: - the result SHA-1 list from rev-list - git version string - .git/shallow - replace object database - pack.* config - maybe some other variables (I haven't checked pack-objects) Another Git implementation can generate this SHA-1 in a totally different way and may even cache the generated pack. If at resume time, the load balancer directs the request to another upload-pack that generates this SHA-1 differently, ok this won't work (i.e. full pack is returned). In a busy repository, some refs may have moved so rev-list result at the resume time won't match any more, but we can deal with that later by relaxing to allow want lines with SHA-1 that are reachable from current refs, not just one of the refs (pack v4 or reachability bitmaps help). -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stalled git cloning and possible solutions
On Fri, Aug 30, 2013 at 7:17 PM, Duy Nguyen pclo...@gmail.com wrote: OK how about a new capability resume to upload-pack. fetch-pack can then send capability resume[=SHA-1,skip] to upload-pack. The first time it sends resume without parameters, and upload-pack will send back an SHA-1 to identify the pack being transferred together with a full pack as usual. When early disconnection happens, it sends the received SHA-1 and the received pack's size so far. It either receives the remaining part, or a full pack. When upload-pack gets resume, it calculates a checksum of all input that may impact pack generation. If the checksum matches the SHA-1 from fetch-pack, it'll continue to generate the pack as usual, but will skip sending the first skip bytes (maybe with a fake header so that fetch-pack realizes this is a partial pack). If the checksum does not match, it sends full pack again. I count on index-pack to spot corrupt resumed pack due to bugs. The input to calculate SHA-1 checksum includes: - the result SHA-1 list from rev-list - git version string - .git/shallow - replace object database - pack.* config - maybe some other variables (I haven't checked pack-objects) should have tested something first before I wrote. --threads adds some randomness to pack generation so it has to be --threads=1. Not sure if git repository hosts are happy with it.. Another Git implementation can generate this SHA-1 in a totally different way and may even cache the generated pack. If at resume time, the load balancer directs the request to another upload-pack that generates this SHA-1 differently, ok this won't work (i.e. full pack is returned). In a busy repository, some refs may have moved so rev-list result at the resume time won't match any more, but we can deal with that later by relaxing to allow want lines with SHA-1 that are reachable from current refs, not just one of the refs (pack v4 or reachability bitmaps help). -- Duy -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Stalled git cloning and possible solutions
Hi, Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what been downloaded, and process needs re-start. Is there a way to recover or continue from already downloaded files during cloning ? Please point me to an archive url if solution exists. (though I continue to search through them as I email this) Can there be something like: git clone url --use-method=rsync -- Regards. V.Krishn -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stalled git cloning and possible solutions
V.Krishn wrote: Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what been downloaded, and process needs re-start. Is there a way to recover or continue from already downloaded files during cloning ? No, sadly. The pack sent for a clone is generated dynamically, so there's no easy way to support the equivalent of an HTTP Range request to resume. Someone might implement an appropriate protocol extension to tackle this (e.g., peff's seed-with-clone.bundle hack) some day, but for now it doesn't exist. What you *can* do today is create a bundle from the large repo somewhere with a reliable connection and then grab that using a resumable transport such as HTTP. A kind person made a service to do that. http://thread.gmane.org/gmane.comp.version-control.git/181380 Hope that helps, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stalled git cloning and possible solutions
On Friday, August 30, 2013 02:40:34 AM you wrote: V.Krishn wrote: Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what been downloaded, and process needs re-start. Is there a way to recover or continue from already downloaded files during cloning ? No, sadly. The pack sent for a clone is generated dynamically, so there's no easy way to support the equivalent of an HTTP Range request to resume. Someone might implement an appropriate protocol extension to tackle this (e.g., peff's seed-with-clone.bundle hack) some day, but for now it doesn't exist. This is what I tried but then realized something more is needed: During stalled clone avoid Ctrl+c. 1. Copy the content .i.e .git folder some other place. 2. cd new dir 3. git config fetch.unpackLimit 99 4. git config transfer.unpackLimit 99 5. cat .git/config #to see if config went ok 6. recover process: git unpack-objects -r --strict .git/objects/pack/tmp_pack_0mSPsc THEN... hopefully thought following should do the trick git pull OR git-fetch-pack OR git repack + git pull but then something more is needed :) like index/map file... etc for it work. What you *can* do today is create a bundle from the large repo somewhere with a reliable connection and then grab that using a resumable transport such as HTTP. A kind person made a service to do that. http://thread.gmane.org/gmane.comp.version-control.git/181380 Service looks nice. Hope its gets sponsors to keep it running. -- Regards. V.Krishn -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stalled git cloning and possible solutions
V.Krishn vkris...@gmail.com writes: On Friday, August 30, 2013 02:40:34 AM you wrote: V.Krishn wrote: Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what been downloaded, and process needs re-start. Is there a way to recover or continue from already downloaded files during cloning ? No, sadly. The pack sent for a clone is generated dynamically, so there's no easy way to support the equivalent of an HTTP Range request to resume. Someone might implement an appropriate protocol extension to tackle this (e.g., peff's seed-with-clone.bundle hack) some day, but for now it doesn't exist. This is what I tried but then realized something more is needed: During stalled clone avoid Ctrl+c. 1. Copy the content .i.e .git folder some other place. 2. cd new dir 3. git config fetch.unpackLimit 99 4. git config transfer.unpackLimit 99 These two steps will not help, as negotiation between the sender and the receiver is based on the commits that are known to be complete, and an earlier failed fetch will not (and should not) update refs on the receiver's side. What you *can* do today is create a bundle from the large repo somewhere with a reliable connection and then grab that using a resumable transport such as HTTP. Yes. Another possibility is, if the project being cloned has a tag (or a branch) that points at a commit back when it was smaller, do this git init x cd x git fetch $that_repository $that_tag:refs/tags/back_then_i_was_small to prime the object store of a temporary repository 'x' with a hopefully smaller transfer, and then use it as a --reference repository to the real clone. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stalled git cloning and possible solutions
On Friday, August 30, 2013 03:48:44 AM you wrote: V.Krishn vkris...@gmail.com writes: On Friday, August 30, 2013 02:40:34 AM you wrote: V.Krishn wrote: Quite sometimes when cloning a large repo stalls, hitting Ctrl+c cleans what been downloaded, and process needs re-start. Is there a way to recover or continue from already downloaded files during cloning ? No, sadly. The pack sent for a clone is generated dynamically, so there's no easy way to support the equivalent of an HTTP Range request to resume. Someone might implement an appropriate protocol extension to tackle this (e.g., peff's seed-with-clone.bundle hack) some day, but for now it doesn't exist. This is what I tried but then realized something more is needed: During stalled clone avoid Ctrl+c. 1. Copy the content .i.e .git folder some other place. 2. cd new dir 3. git config fetch.unpackLimit 99 4. git config transfer.unpackLimit 99 These two steps will not help, as negotiation between the sender and the receiver is based on the commits that are known to be complete, and an earlier failed fetch will not (and should not) update refs on the receiver's side. What you *can* do today is create a bundle from the large repo somewhere with a reliable connection and then grab that using a resumable transport such as HTTP. Yes. Another possibility is, if the project being cloned has a tag (or a branch) that points at a commit back when it was smaller, do this git init x cd x git fetch $that_repository $that_tag:refs/tags/back_then_i_was_small to prime the object store of a temporary repository 'x' with a hopefully smaller transfer, and then use it as a --reference repository to the real clone. Would be nice if, 1. the clone process downloaded all files in .git before the blobs or packing process and added a lock file like .clone and then started the packing process. 2. Any interrupt((ctrl+c) should not delete the already dowloaded files but on re-clone process it should check .clone file and resume cloning. 3. Upon finishing cloning delete .clone file. -- Regards. V.Krishn -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html