Re: Dump http servers still slow?
On Sat, 2005-07-30 at 23:51 -0700, Junio C Hamano wrote: > Darrin Thompson <[EMAIL PROTECTED]> writes: > > > 1. Pack files should reduce the number of http round trips. > > 2. What I'm seeing when I check out mainline git is the acquisition of a > > single large pack, then 600+ more recent objects. Better than before, > > but still hundreds of round trips. > > I've packed the git.git repository, by the way. It has 43 > unpacked objects totalling 224 kilobytes, so cloning over dumb > http should go a lot faster until we accumulate more unpacked > objects. I did a pull from the office and the times were 27 sec for http and 17 sec for rsync. So the moral of the story should be that frequent repacks are sufficient for decent http performance. -- Darrin - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dump http servers still slow?
Darrin Thompson <[EMAIL PROTECTED]> writes: > 1. Pack files should reduce the number of http round trips. > 2. What I'm seeing when I check out mainline git is the acquisition of a > single large pack, then 600+ more recent objects. Better than before, > but still hundreds of round trips. I've packed the git.git repository, by the way. It has 43 unpacked objects totalling 224 kilobytes, so cloning over dumb http should go a lot faster until we accumulate more unpacked objects. Some of you may have noticed that in the proposed updates queue ("pu" branch) I have a couple of commits related to pulling from a packed dumb http server. There are two "git fetch http://"; commits to let you pull from such, and another stupid "count objects" script that you can use to see how many unpacked objects you have in your repository; the latter is to help you decide when to repack. Brave souls may want to try out the dumb http fetch. For example, it _should_ do the right thing even if you do the following: $ git clone http://www.kernel.org/pub/scm/git/git.git/ newdir $ cd newdir $ mv .git/objects/pack/pack-* . ;# even if you unpack packs on your $ rm -f pack-*.idx ;# end, it should do the right thing. $ for pack in pack-*.pack; do git-unpack-objects <$pack rm -f "$pack" done $ rm -f .git/refs/heads/pu $ git prune ;# lose objects in "pu" but still not in "master" $ git pull origin pu $ git ls-remote origin | while read sha1 refname do case "$refname" in refs/heads/master) echo $sha1 >".git/$refname" ;; esac done ;# revert master to upstream master $ old=$(git-rev-parse master^^) $ echo "$old" >.git/refs/heads/master ;# rewind further $ git checkout -f master $ git prune ;# try losing a bit more objects. $ git pull origin master $ git ls-remote ./.;# show me my refs $ git ls-remote origin ;# show me his refs Unlike my other shell scripts I usually write in my e-mail buffer, I have actually run the above ;-). -jc - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dump http servers still slow?
Darrin Thompson <[EMAIL PROTECTED]> writes: > Ok... so lets check my assumptions: > > 1. Pack files should reduce the number of http round trips. > 2. What I'm seeing when I check out mainline git is the acquisition of a > single large pack, then 600+ more recent objects. Better than before, > but still hundreds of round trips. > 3. If I wanted to further speed up the initial checkout on my own > repositories I could frequently repack my most recent few hundred > objects. > 4. If curl had pipelining then less pack management would be needed. All true. Another possibility is to make multiple requests in parallel; if curl does not do pipelining, either switch to something that does, or have more then one process using curl. The dumb server preparation creates three files, two of which is currently used by clone (one is list of packs, the other is list of branches and tags). The third one is commit ancestry information. The commit walker could be taught to read it to figure out what commits it still needs to fetch without waiting for the commit being retrieved to be parsed. Sorry, I am not planning to write that part myself. One potential low hanging fruit is that even for cloning via git:// URL we _might_ be better off starting with the dumb server protocol; get the list of statically prepared packs and obtain them upfront before starting the clone-pack/upload-pack protocol pair. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dump http servers still slow?
On Fri, 2005-07-29 at 17:08 +0200, Radoslaw AstralStorm Szkodzinski wrote: > On Fri, 29 Jul 2005 09:57:36 -0500 > Darrin Thompson <[EMAIL PROTECTED]> wrote: > > > Can't see the code. > > > > http://www.kernel.org/pub/software/scm/gitweb/gitweb.cgi > > > > Internal Server Error > > > > Use FTP. > Duh. Thanks. -- Darrin - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dump http servers still slow?
On Fri, 29 Jul 2005 09:57:36 -0500 Darrin Thompson <[EMAIL PROTECTED]> wrote: > Can't see the code. > > http://www.kernel.org/pub/software/scm/gitweb/gitweb.cgi > > Internal Server Error > Use FTP. -- AstralStorm GPG Key ID = 0xD1F10BA2 GPG Key fingerprint = 96E2 304A B9C4 949A 10A0 9105 9543 0453 D1F1 0BA2 Please encrypt if you can. pgpqLCSyQcYJK.pgp Description: PGP signature
Re: Dump http servers still slow?
On Fri, 2005-07-29 at 10:48 -0400, Ryan Anderson wrote: > On Fri, Jul 29, 2005 at 09:03:41AM -0500, Darrin Thompson wrote: > > > > Where is the code for gitweb? (i.e. http://kernel.org/git ) Seems like > > it could benefit from some git-send-pack superpowers. > > http://www.kernel.org/pub/software/scm/gitweb/ > > It occurs to me that pulling this into the main git repository might not > be a bad idea, since it is currently living outside any revision > tracking at the moment. > Can't see the code. http://www.kernel.org/pub/software/scm/gitweb/gitweb.cgi Internal Server Error -- Darrin - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dump http servers still slow?
On Fri, Jul 29, 2005 at 09:03:41AM -0500, Darrin Thompson wrote: > > Where is the code for gitweb? (i.e. http://kernel.org/git ) Seems like > it could benefit from some git-send-pack superpowers. http://www.kernel.org/pub/software/scm/gitweb/ It occurs to me that pulling this into the main git repository might not be a bad idea, since it is currently living outside any revision tracking at the moment. -- Ryan Anderson sometimes Pug Majere - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dump http servers still slow?
On Thu, 2005-07-28 at 19:24 -0700, Junio C Hamano wrote: > The thing is, the base pack for the git repository is 1.8MB > currently containing 4500+ objects, while we accumulated 600+ > unpacked objects since then which is about ~5MB. The commit > walker needs to fetched the latter one by one in the old way. > > When packed incrementally on top of the base pack, these 600+ > unpacked objects compress down to something like 400KB, and I > was hoping we could wait until we accumulate enough to produce > an incremental about a meg or so ... Ok... so lets check my assumptions: 1. Pack files should reduce the number of http round trips. 2. What I'm seeing when I check out mainline git is the acquisition of a single large pack, then 600+ more recent objects. Better than before, but still hundreds of round trips. 3. If I wanted to further speed up the initial checkout on my own repositories I could frequently repack my most recent few hundred objects. 4. If curl had pipelining then less pack management would be needed. Where is the code for gitweb? (i.e. http://kernel.org/git ) Seems like it could benefit from some git-send-pack superpowers. -- Darrin - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dump http servers still slow?
Darrin Thompson <[EMAIL PROTECTED]> writes: > I just ran git clone against the mainline git repository using both http > and rsync. http was still quite slow compared to rsync. I expected that > the http time would be much faster than in the past due to the pack > file. > > Is there something simple I'm missing? No, the only thing you missed was that I did not write it to make it fast, but just to make it work ;-). The commit walker simply does not work against a dumb http server repository that is packed and prune-packed, which is already the case for both kernel and git repositories. The thing is, the base pack for the git repository is 1.8MB currently containing 4500+ objects, while we accumulated 600+ unpacked objects since then which is about ~5MB. The commit walker needs to fetched the latter one by one in the old way. When packed incrementally on top of the base pack, these 600+ unpacked objects compress down to something like 400KB, and I was hoping we could wait until we accumulate enough to produce an incremental about a meg or so ... - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Dump http servers still slow?
Junio, I just ran git clone against the mainline git repository using both http and rsync. http was still quite slow compared to rsync. I expected that the http time would be much faster than in the past due to the pack file. Is there something simple I'm missing? -- Darrin - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html