Re: [RFD] server-info to help clients

2005-07-19 Thread Junio C Hamano
The management of multiple packs and strategy of deciding when
to create the next incremental (be it throw-away or permanent)
is something I am not particularly interested in at this moment,
and as you correctly pointed out, the "single throw-away pack"
is an example of _bad_ strategy [*1*].  I am more interested in
designing a concise way to express what the server side has
(after applying such packing/repacking strategy) to help clients
coming over a dumb transport.

One thing that I forgot to mention is that there is another
per-repository information "$repo/info/revinfo".  This lists all
the commit ancestry reachable from "$repo/refs", and is needed
for clients to find out the closest commit from the very tip of
branches, which are likely not packed yet, that appears as a
head in "$repo/objects/info/pack" and go from there.


[Footnote]

*1* As I said, I am not interested in thinking about this at
this moment, but I suspect a scheme that employs the base pack,
permanent incrementals, and N new throw-aways every day for the
N-th day of the month may work reasonably well.

On the N-th day of the month, you create incrementals relative
to what existed on the N-1th, N-2th, ..., 1st of the month.  At
the end of the day, create N+1 new throw-aways for the N+1th day
of the month (you can garbage collect older days' throw-away
incrementals whenever you like).  At the end of the month, you
mark the throw-away incremental that is relative to the
beginning of the month as the latest permanent incremental.

Bootstrappers can slurp base, permanent incremental and the
throwaway for today that is relative to the last permanent
incremental.  Updaters can pick the one relative to the day they
updated last time.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] server-info to help clients

2005-07-19 Thread David Lang

i wonder how much benifit there is to the throw-away packs.

if you do permanent incremental packs every day (or every few days) is 
there really enough activity to make it worth the added complexities 
(specificly including detecting that it is a throw-away pack on the client 
side and therefor you probably don't want to keep it) for the slight 
performance increase you may get


remember that since deltas only work within a pack the throw-away pack 
will only be noticably smaller once you start having one file modified 
multiple times before a new incremental pack is created, so you aren't 
likly to save much on space, so all you are likly to save is the overhead 
of fetching multiple objects compared to one object.


going forward it may be worth a smarter packing program to support HPA's 
goal of a cental object storage, one that can make decisions like: 'object 
A is part of this 40% of the trees, while object B is part of that 
otehr 40% (disjoint set) so it's probably a good idea to put them into 
seperate packs'


but that can be done much futher down the road without having to change 
the clients at all.


David Lang


 On Tue, 19 Jul 2005, 
Junio C Hamano wrote:



Date: Tue, 19 Jul 2005 17:20:58 -0700
From: Junio C Hamano <[EMAIL PROTECTED]>
To: Linus Torvalds <[EMAIL PROTECTED]>
Cc: git@vger.kernel.org, [EMAIL PROTECTED]
Subject: [RFD] server-info to help clients

While things are quiet (I envy everybody having fun at OLS),
I've been cooking something to help clients to pull from dumb
servers.

I assume that:

- The object database is packed, following the recommendations
  in the "Working with Others" section of the tutorial.

- The repository owner _may_ further create throw-away
  incremental packs.  There can be the following in one object
  database:

- one baseline pack.
- permanent incremental packs #1 .. #N
- one throw-away incremental pack.
- unpacked files under objects/??/.

  Baseline and permanent incremental packs are built by "git
  repack", just like Linus recommended from the beginning.  The
  throwaway pack is built periodically (say every hour) to
  collect all objects that are not in the baseline nor
  permanent incrementals.  Building of such a throw-away pack
  involves:

- unpacking and removal of the current throw-away pack.
- running "git repack".
- running "git prune-packed".

- The server could be truly dumb and can even refuse to serve
  dirindex; parsing autogenerated index.html is a pain anyway.

First, a somewhat related change I did was to write a script
called "git ls-remote".  It is used this way:

   $ git ls-remote origin
   17c0bd743c1c8113cd0ed72b7ca1776d13c27e01 HEAD
   17c0bd743c1c8113cd0ed72b7ca1776d13c27e01 refs/heads/master
   f0b32737ad5a35cc047db47353a75faccfe5939e refs/heads/linus
   4d9ae497491fd838dafd7fcbd11c4aa678a726f1 refs/heads/pu
   d6602ec5194c87b0fc87103ca4d67251c76f233a refs/tags/v0.99
   f25a265a342aed6041ab0cc484224d9ca54b6f41 refs/tags/v0.99.1

It slurps the set of refs from a remote repository (the same
short-hand we stole from Cogito using .git/branches/ can be used
here) and optionally it can be told to store tags under local
refs/.

This is produced by connecting directly to the git-daemon
running on the remote side and talking upload-pack protocol with
it.  A new helper program "git-peek-remote" is used to do this
when we use git:// URL.  From an rsync URL, everything under its
refs/ is copied to a temporary directory to produce the same
information.

To support the same on a dumb transport, I gave the server side
a new command, "git update-server-info", which prepares this
information in "$repo/info/refs", so writing http support for
"git ls-remote" using curl is trivial.  I arranged things so
that update-server-info is run whenever you push into the
repository via "git push".  You can of course run it by hand
from the command line.

The other file that update-server-info produces is to help dumb
pullers.  It is stored in "$repo/objects/info/pack", and looks
like this:

   P pack-c60dc6f7486e34043bd6861d6b2c0d21756dde76.pack
   P pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack
   D 0 1
   D 1
   T 0 9fb1759a3102c26cd8f64254a7c3e532782c2bb8 commit
   T 0 a339981ec18d304f9efeb9ccf01b1f04302edf32 tag
   T 1 0397236d43e48e821cce5bbe6a80a1a56bb7cc3a tag
   T 1 043d051615aa5da09a7e44f1edbb69798458e067 commit
   T 1 06f6d9e2f140466eeb41e494e14167f90210f89d tag
   T 1 26791a8bcf0e6d33f43aef7682bdb555236d56de tag
   T 1 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c tag
   T 1 701d7ecec3e0c6b4ab9bb824fd2b34be4da63b7e tag
   T 1 733ad933f62e82ebc92fed988c7f0795e64dea62 tag
   T 1 9e734775f7c22d2f89943ad6c745571f1930105f tag
   T 1 c521cb0f10ef2bf28a18e1cc8adf378ccbbe5a19 tag
   T 1 ebb5573ea8beaf000d4833735f3e53acb9af844c tag

The lines that start with a 'P' list all the packs available in
this object database (relative to $repo/objects/pack).  These
packs are implicitly numbered starting a