Re: Dump http servers still slow?

2005-08-01 Thread Darrin Thompson
On Sat, 2005-07-30 at 23:51 -0700, Junio C Hamano wrote:
> Darrin Thompson <[EMAIL PROTECTED]> writes:
> 
> > 1. Pack files should reduce the number of http round trips.
> > 2. What I'm seeing when I check out mainline git is the acquisition of a
> > single large pack, then 600+ more recent objects. Better than before,
> > but still hundreds of round trips.
> 
> I've packed the git.git repository, by the way.  It has 43
> unpacked objects totalling 224 kilobytes, so cloning over dumb
> http should go a lot faster until we accumulate more unpacked
> objects.

I did a pull from the office and the times were 27 sec for http and 17
sec for rsync. So the moral of the story should be that frequent repacks
are sufficient for decent http performance.

--
Darrin


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dump http servers still slow?

2005-07-30 Thread Junio C Hamano
Darrin Thompson <[EMAIL PROTECTED]> writes:

> 1. Pack files should reduce the number of http round trips.
> 2. What I'm seeing when I check out mainline git is the acquisition of a
> single large pack, then 600+ more recent objects. Better than before,
> but still hundreds of round trips.

I've packed the git.git repository, by the way.  It has 43
unpacked objects totalling 224 kilobytes, so cloning over dumb
http should go a lot faster until we accumulate more unpacked
objects.

Some of you may have noticed that in the proposed updates queue
("pu" branch) I have a couple of commits related to pulling from
a packed dumb http server.  There are two "git fetch http://";
commits to let you pull from such, and another stupid "count
objects" script that you can use to see how many unpacked
objects you have in your repository; the latter is to help
you decide when to repack.

Brave souls may want to try out the dumb http fetch.  For
example, it _should_ do the right thing even if you do the
following:

 $ git clone http://www.kernel.org/pub/scm/git/git.git/ newdir
 $ cd newdir
 $ mv .git/objects/pack/pack-* . ;# even if you unpack packs on your
 $ rm -f pack-*.idx  ;# end, it should do the right thing.
 $ for pack in pack-*.pack; do
 git-unpack-objects <$pack
 rm -f "$pack"
   done
 $ rm -f .git/refs/heads/pu
 $ git prune ;# lose objects in "pu" but still not in "master"
 $ git pull origin pu
 $ git ls-remote origin |
   while read sha1 refname
   do
   case "$refname" in
   refs/heads/master) echo $sha1 >".git/$refname" ;;
   esac
   done ;# revert master to upstream master
 $ old=$(git-rev-parse master^^)
 $ echo "$old" >.git/refs/heads/master ;# rewind further
 $ git checkout -f master
 $ git prune ;# try losing a bit more objects.
 $ git pull origin master
 $ git ls-remote ./.;# show me my refs
 $ git ls-remote origin ;# show me his refs

Unlike my other shell scripts I usually write in my e-mail
buffer, I have actually run the above ;-).

-jc


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dump http servers still slow?

2005-07-29 Thread Junio C Hamano
Darrin Thompson <[EMAIL PROTECTED]> writes:

> Ok... so lets check my assumptions:
>
> 1. Pack files should reduce the number of http round trips.
> 2. What I'm seeing when I check out mainline git is the acquisition of a
> single large pack, then 600+ more recent objects. Better than before,
> but still hundreds of round trips.
> 3. If I wanted to further speed up the initial checkout on my own
> repositories I could frequently repack my most recent few hundred
> objects.
> 4. If curl had pipelining then less pack management would be needed.

All true.  Another possibility is to make multiple requests in
parallel; if curl does not do pipelining, either switch to
something that does, or have more then one process using curl.

The dumb server preparation creates three files, two of which is
currently used by clone (one is list of packs, the other is list
of branches and tags).  The third one is commit ancestry
information.  The commit walker could be taught to read it to
figure out what commits it still needs to fetch without waiting
for the commit being retrieved to be parsed.

Sorry, I am not planning to write that part myself.

One potential low hanging fruit is that even for cloning via
git:// URL we _might_ be better off starting with the dumb
server protocol; get the list of statically prepared packs and
obtain them upfront before starting the clone-pack/upload-pack
protocol pair.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dump http servers still slow?

2005-07-29 Thread Darrin Thompson
On Fri, 2005-07-29 at 17:08 +0200, Radoslaw AstralStorm Szkodzinski
wrote:
> On Fri, 29 Jul 2005 09:57:36 -0500
> Darrin Thompson <[EMAIL PROTECTED]> wrote:
> 
> > Can't see the code.
> > 
> > http://www.kernel.org/pub/software/scm/gitweb/gitweb.cgi
> > 
> > Internal Server Error
> > 
> 
> Use FTP.
> 

Duh. Thanks.

--
Darrin


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dump http servers still slow?

2005-07-29 Thread AstralStorm
On Fri, 29 Jul 2005 09:57:36 -0500
Darrin Thompson <[EMAIL PROTECTED]> wrote:

> Can't see the code.
> 
> http://www.kernel.org/pub/software/scm/gitweb/gitweb.cgi
> 
> Internal Server Error
> 

Use FTP.

-- 
AstralStorm

GPG Key ID = 0xD1F10BA2
GPG Key fingerprint = 96E2 304A B9C4 949A 10A0  9105 9543 0453 D1F1 0BA2
Please encrypt if you can.


pgpqLCSyQcYJK.pgp
Description: PGP signature


Re: Dump http servers still slow?

2005-07-29 Thread Darrin Thompson
On Fri, 2005-07-29 at 10:48 -0400, Ryan Anderson wrote:
> On Fri, Jul 29, 2005 at 09:03:41AM -0500, Darrin Thompson wrote:
> > 
> > Where is the code for gitweb? (i.e. http://kernel.org/git ) Seems like
> > it could benefit from some git-send-pack superpowers.
> 
> http://www.kernel.org/pub/software/scm/gitweb/
> 
> It occurs to me that pulling this into the main git repository might not
> be a bad idea, since it is currently living outside any revision
> tracking at the moment.
> 

Can't see the code.

http://www.kernel.org/pub/software/scm/gitweb/gitweb.cgi

Internal Server Error

--
Darrin


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dump http servers still slow?

2005-07-29 Thread Ryan Anderson
On Fri, Jul 29, 2005 at 09:03:41AM -0500, Darrin Thompson wrote:
> 
> Where is the code for gitweb? (i.e. http://kernel.org/git ) Seems like
> it could benefit from some git-send-pack superpowers.

http://www.kernel.org/pub/software/scm/gitweb/

It occurs to me that pulling this into the main git repository might not
be a bad idea, since it is currently living outside any revision
tracking at the moment.

-- 

Ryan Anderson
  sometimes Pug Majere
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dump http servers still slow?

2005-07-29 Thread Darrin Thompson
On Thu, 2005-07-28 at 19:24 -0700, Junio C Hamano wrote:
> The thing is, the base pack for the git repository is 1.8MB
> currently containing 4500+ objects, while we accumulated 600+
> unpacked objects since then which is about ~5MB.  The commit
> walker needs to fetched the latter one by one in the old way.
> 
> When packed incrementally on top of the base pack, these 600+
> unpacked objects compress down to something like 400KB, and I
> was hoping we could wait until we accumulate enough to produce
> an incremental about a meg or so ...

Ok... so lets check my assumptions:

1. Pack files should reduce the number of http round trips.
2. What I'm seeing when I check out mainline git is the acquisition of a
single large pack, then 600+ more recent objects. Better than before,
but still hundreds of round trips.
3. If I wanted to further speed up the initial checkout on my own
repositories I could frequently repack my most recent few hundred
objects.
4. If curl had pipelining then less pack management would be needed.

Where is the code for gitweb? (i.e. http://kernel.org/git ) Seems like
it could benefit from some git-send-pack superpowers.

--
Darrin


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dump http servers still slow?

2005-07-28 Thread Junio C Hamano
Darrin Thompson <[EMAIL PROTECTED]> writes:

> I just ran git clone against the mainline git repository using both http
> and rsync. http was still quite slow compared to rsync. I expected that
> the http time would be much faster than in the past due to the pack
> file.
>
> Is there something simple I'm missing?

No, the only thing you missed was that I did not write it to
make it fast, but just to make it work ;-).  The commit walker
simply does not work against a dumb http server repository that
is packed and prune-packed, which is already the case for both
kernel and git repositories.

The thing is, the base pack for the git repository is 1.8MB
currently containing 4500+ objects, while we accumulated 600+
unpacked objects since then which is about ~5MB.  The commit
walker needs to fetched the latter one by one in the old way.

When packed incrementally on top of the base pack, these 600+
unpacked objects compress down to something like 400KB, and I
was hoping we could wait until we accumulate enough to produce
an incremental about a meg or so ...


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Dump http servers still slow?

2005-07-28 Thread Darrin Thompson
Junio,

I just ran git clone against the mainline git repository using both http
and rsync. http was still quite slow compared to rsync. I expected that
the http time would be much faster than in the past due to the pack
file.

Is there something simple I'm missing?

--
Darrin


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html