i wonder how much benifit there is to the throw-away packs.
if you do permanent incremental packs every day (or every few days) is
there really enough activity to make it worth the added complexities
(specificly including detecting that it is a throw-away pack on the client
side and therefor you probably don't want to keep it) for the slight
performance increase you may get
remember that since deltas only work within a pack the throw-away pack
will only be noticably smaller once you start having one file modified
multiple times before a new incremental pack is created, so you aren't
likly to save much on space, so all you are likly to save is the overhead
of fetching multiple objects compared to one object.
going forward it may be worth a smarter packing program to support HPA's
goal of a cental object storage, one that can make decisions like: 'object
A is part of this 40% of the trees, while object B is part of that
otehr 40% (disjoint set) so it's probably a good idea to put them into
seperate packs'
but that can be done much futher down the road without having to change
the clients at all.
David Lang
On Tue, 19 Jul 2005,
Junio C Hamano wrote:
Date: Tue, 19 Jul 2005 17:20:58 -0700
From: Junio C Hamano <[EMAIL PROTECTED]>
To: Linus Torvalds <[EMAIL PROTECTED]>
Cc: git@vger.kernel.org, [EMAIL PROTECTED]
Subject: [RFD] server-info to help clients
While things are quiet (I envy everybody having fun at OLS),
I've been cooking something to help clients to pull from dumb
servers.
I assume that:
- The object database is packed, following the recommendations
in the "Working with Others" section of the tutorial.
- The repository owner _may_ further create throw-away
incremental packs. There can be the following in one object
database:
- one baseline pack.
- permanent incremental packs #1 .. #N
- one throw-away incremental pack.
- unpacked files under objects/??/.
Baseline and permanent incremental packs are built by "git
repack", just like Linus recommended from the beginning. The
throwaway pack is built periodically (say every hour) to
collect all objects that are not in the baseline nor
permanent incrementals. Building of such a throw-away pack
involves:
- unpacking and removal of the current throw-away pack.
- running "git repack".
- running "git prune-packed".
- The server could be truly dumb and can even refuse to serve
dirindex; parsing autogenerated index.html is a pain anyway.
First, a somewhat related change I did was to write a script
called "git ls-remote". It is used this way:
$ git ls-remote origin
17c0bd743c1c8113cd0ed72b7ca1776d13c27e01 HEAD
17c0bd743c1c8113cd0ed72b7ca1776d13c27e01 refs/heads/master
f0b32737ad5a35cc047db47353a75faccfe5939e refs/heads/linus
4d9ae497491fd838dafd7fcbd11c4aa678a726f1 refs/heads/pu
d6602ec5194c87b0fc87103ca4d67251c76f233a refs/tags/v0.99
f25a265a342aed6041ab0cc484224d9ca54b6f41 refs/tags/v0.99.1
It slurps the set of refs from a remote repository (the same
short-hand we stole from Cogito using .git/branches/ can be used
here) and optionally it can be told to store tags under local
refs/.
This is produced by connecting directly to the git-daemon
running on the remote side and talking upload-pack protocol with
it. A new helper program "git-peek-remote" is used to do this
when we use git:// URL. From an rsync URL, everything under its
refs/ is copied to a temporary directory to produce the same
information.
To support the same on a dumb transport, I gave the server side
a new command, "git update-server-info", which prepares this
information in "$repo/info/refs", so writing http support for
"git ls-remote" using curl is trivial. I arranged things so
that update-server-info is run whenever you push into the
repository via "git push". You can of course run it by hand
from the command line.
The other file that update-server-info produces is to help dumb
pullers. It is stored in "$repo/objects/info/pack", and looks
like this:
P pack-c60dc6f7486e34043bd6861d6b2c0d21756dde76.pack
P pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack
D 0 1
D 1
T 0 9fb1759a3102c26cd8f64254a7c3e532782c2bb8 commit
T 0 a339981ec18d304f9efeb9ccf01b1f04302edf32 tag
T 1 0397236d43e48e821cce5bbe6a80a1a56bb7cc3a tag
T 1 043d051615aa5da09a7e44f1edbb69798458e067 commit
T 1 06f6d9e2f140466eeb41e494e14167f90210f89d tag
T 1 26791a8bcf0e6d33f43aef7682bdb555236d56de tag
T 1 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c tag
T 1 701d7ecec3e0c6b4ab9bb824fd2b34be4da63b7e tag
T 1 733ad933f62e82ebc92fed988c7f0795e64dea62 tag
T 1 9e734775f7c22d2f89943ad6c745571f1930105f tag
T 1 c521cb0f10ef2bf28a18e1cc8adf378ccbbe5a19 tag
T 1 ebb5573ea8beaf000d4833735f3e53acb9af844c tag
The lines that start with a 'P' list all the packs available in
this object database (relative to $repo/objects/pack). These
packs are implicitly numbered starting a