Re: [PATCH] Alternate object pool mechanism updates.

2005-08-16 Thread Linus Torvalds


On Tue, 16 Aug 2005, Junio C Hamano wrote:
> Linus Torvalds <[EMAIL PROTECTED]> writes:
> 
> > We've got a "git prune-packed", it would be good to have a "git
> > prune-alternate" or something equivalent.
> 
> If you have GIT_ALTERNATE_DIRECTORIES environment variable, git
> prune-packed will remove objects from your repository if it is
> found in somebody else's pack.  I am not sure if this is the
> behaviour we would want.

Well, it may be good enough if the "master" repository is regularly 
packed..

Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Alternate object pool mechanism updates.

2005-08-16 Thread Junio C Hamano
Linus Torvalds <[EMAIL PROTECTED]> writes:

> We've got a "git prune-packed", it would be good to have a "git
> prune-alternate" or something equivalent.

If you have GIT_ALTERNATE_DIRECTORIES environment variable, git
prune-packed will remove objects from your repository if it is
found in somebody else's pack.  I am not sure if this is the
behaviour we would want.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Alternate object pool mechanism updates.

2005-08-16 Thread Junio C Hamano
Linus Torvalds <[EMAIL PROTECTED]> writes:

> Btw, looking at the code, it strikes me that using ":" to separate the 
> alternate object directories in the file is rather strange.

Yes, I admit it one was done in a quick and dirty way.  Patches
welcome [*1*] ;-)

> Anyway, I don't think "alternates" is necessarily sensible as a "object"  
> information. Sure, it's about alternate objects, but the thing is, object 
> directories can be shared across many projects, but "alternates" to me 
> makes more sense as a per-project thing.

Well, I have to think about this a bit more, but I have to say
there were some thinking behind the way things are right now.

$GIT_DIR/info describes properties of the repository.  That's
why refs, graft and rev-cache go there.

$GIT_OBJECT_DIRECTORY/info describes the properties of the
object pool (I am inventing words as I speak, but an object pool
is a directory that can be combined with other object pools to
make an object database).  So object/info/packs talks about the
packs in it, but not about packs it borrows from its alternates.
The alternates file in question talks about what other object
pools you need to consult to obtain the objects it refers to but
it lacks itself.  If two repositories share a particular object
pool as its .git/objects directory, by symlinking .git/objects
or by using GIT_OBJECT_DIRECTORY environment, it does not matter
from which repository you look at this object pool.  The set of
objects it refers to but lacks itself, and from which other
pools these objects can be obtained, do not depend on from which
repository you are looking at it.  While I agree with everything
you said about "maybe logical but confusing", I have to disagree
with you about this one.

> What this all is leading up to is that I think we'd be better off with a 
> totally new "git config" file, in ".git/config", and we'd have all the 
> startup configuration there.

I think what _is_ lacking is an easy way to have per repository
configuration that can be shared among "opt-in" developers.  The
graft file naturally falls into this category, and probably the
Porcelain standard .git/info/exclude file as well.  Although we
ended up doing .git/hooks, it is a per repository thing and
logically it _could_ be moved to .git/info/hooks [*2*].  And
that might also be a nice thing to share among "opt-in"
developers.

[Footnote]

*1* Sorry I could not resist --- I always wanted to say this.

*2* I do not think we _should_ move it under .git/info, though.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Alternate object pool mechanism updates.

2005-08-16 Thread Daniel Barkalow
On Tue, 16 Aug 2005, Linus Torvalds wrote:

> Finally, I have to say that that "info" directory is confusing. Namely,
> there's two of them - the "git info" and the "object info" directories are
> totally different directories - maybe logical, but to me it smells like
> "info" is here a code-name for "misc files that don't make sense anywhere
> else".
>
> What this all is leading up to is that I think we'd be better off with a
> totally new "git config" file, in ".git/config", and we'd have all the
> startup configuration there. Including things like alternate object
> directories, perhaps standard preferences for that particular repo, and
> things like the "grafts" thing.
>
> Wouldn't that be nice?

I'd originally proposed the .git/info directory because I keep multiple
working trees for the same repository, by having symlinks for .git/objects
and .git/refs, and I could also get other per-repository things to be
shared properly without knowing exactly what they are if they're in a
subdirectory of .git that could be a symlink. This would mean that a
".git/config" would be per-working-tree, like .git/index or .git/HEAD, not
pre-repository like ".git/info/config". Of course, the core didn't have
any thing to go in .git/info at the time, so it didn't really get tacked
down.

(I find it convenient to have mainline and my latest work both checked out
for reference while I'm generating a series of commits for a patch set,
and I don't want three different repositories which could be out of sync;
this also keeps the repository safely out of pwd, since I have the actual
repositories as ~/git/{project}.git/)

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Alternate object pool mechanism updates.

2005-08-16 Thread Linus Torvalds


On Sun, 14 Aug 2005, Junio C Hamano wrote:

> Linus Torvalds <[EMAIL PROTECTED]> writes:
> 
> > I think this is great - especially for places like kernel.org, where a lot 
> > of repos end up being related to each other, yet independent.
> 
> Yes.  There is one shortcoming in the current git-clone -s in
> the proposed updates branch.  If the parent repository has
> alternates on its own, that information should be copied to the
> cloned one as well (e.g. Jeff has alternates pointing at you,
> and I clone from Jeff with -s flag --- I should list not just
> Jeff but also you to borrow from in my alternates file).

Btw, looking at the code, it strikes me that using ":" to separate the 
alternate object directories in the file is rather strange.

Maybe allow a different format for the file? Or at least allow '\n' as an 
alternate separator (but it would be nice to allow comments too).

Finally, I have to say that that "info" directory is confusing. Namely,
there's two of them - the "git info" and the "object info" directories are
totally different directories - maybe logical, but to me it smells like
"info" is here a code-name for "misc files that don't make sense anywhere
else".

Anyway, I don't think "alternates" is necessarily sensible as a "object"  
information. Sure, it's about alternate objects, but the thing is, object 
directories can be shared across many projects, but "alternates" to me 
makes more sense as a per-project thing.

What this all is leading up to is that I think we'd be better off with a 
totally new "git config" file, in ".git/config", and we'd have all the 
startup configuration there. Including things like alternate object 
directories, perhaps standard preferences for that particular repo, and 
things like the "grafts" thing.

Wouldn't that be nice?

Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Alternate object pool mechanism updates.

2005-08-14 Thread Junio C Hamano
Linus Torvalds <[EMAIL PROTECTED]> writes:

> I think this is great - especially for places like kernel.org, where a lot 
> of repos end up being related to each other, yet independent.

Yes.  There is one shortcoming in the current git-clone -s in
the proposed updates branch.  If the parent repository has
alternates on its own, that information should be copied to the
cloned one as well (e.g. Jeff has alternates pointing at you,
and I clone from Jeff with -s flag --- I should list not just
Jeff but also you to borrow from in my alternates file).

> However, exactly for places like kernel.org it would _also_ be nice if
> there was some way to prune objects that have been merged back into the
> parent.

Yes.  Another possibility is to use git-relink which was written
exactly to solve this in a different way.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Alternate object pool mechanism updates.

2005-08-14 Thread Linus Torvalds


On Sun, 14 Aug 2005, Junio C Hamano wrote:
> 
> Ok, so the one in the proposed updates branch says
> info/alternates.
> 
> With this, your recent cg-clone -l can be made to still use
> individual .git/object/??/ hierarchy to keep objects newly
> created in each repository while sharing the inherited objects
> from the parent repository, which would probably alleviate the
> multi-user environment worries you express in the comments for
> the option.  The git-clone-script in the proposed updates branch
> has such a change.

I think this is great - especially for places like kernel.org, where a lot 
of repos end up being related to each other, yet independent.

However, exactly for places like kernel.org it would _also_ be nice if
there was some way to prune objects that have been merged back into the
parent. In other words, imagine that people start using my kernel tree as
their source of "alternate" objects, which works wonderfully well, but
then as I pull from them, nothing ever removes the objects that are now
duplicate.

We've got a "git prune-packed", it would be good to have a "git
prune-alternate" or something equivalent.

Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Alternate object pool mechanism updates.

2005-08-14 Thread Junio C Hamano
Petr Baudis <[EMAIL PROTECTED]> writes:

> What about calling it rather info/alternates (or info/alternate)? It
> looks better, sounds better, is more namespace-ecological tab-completes
> fine and you don't type it that often anyway. :-)

Ok, so the one in the proposed updates branch says
info/alternates.

With this, your recent cg-clone -l can be made to still use
individual .git/object/??/ hierarchy to keep objects newly
created in each repository while sharing the inherited objects
from the parent repository, which would probably alleviate the
multi-user environment worries you express in the comments for
the option.  The git-clone-script in the proposed updates branch
has such a change.


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Alternate object pool mechanism updates.

2005-08-13 Thread Junio C Hamano
Petr Baudis <[EMAIL PROTECTED]> writes:

> What about calling it rather info/alternates (or info/alternate)? It
> looks better, sounds better, is more namespace-ecological tab-completes
> fine and you don't type it that often anyway. :-)

Thanks for the suggestion.  Will fix and keep it in the pu
branch for now just in case somebody else suggests a name even
better.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Alternate object pool mechanism updates.

2005-08-13 Thread Petr Baudis
Dear diary, on Sat, Aug 13, 2005 at 11:09:13AM CEST, I got a letter
where Junio C Hamano <[EMAIL PROTECTED]> told me that...
> It was a mistake to use GIT_ALTERNATE_OBJECT_DIRECTORIES
> environment variable to specify what alternate object pools to
> look for missing objects when working with an object database.
> It is not a property of the process running the git commands,
> but a property of the object database that is partial and needs
> other object pools to complete the set of objects it lacks.
> 
> This patch allows you to have $GIT_OBJECT_DIRECTORY/info/alt
> file whose contents is in exactly the same format as the
> environment variable, to let an object database name alternate
> object pools it depends on.
> 
> Signed-off-by: Junio C Hamano <[EMAIL PROTECTED]>

What about calling it rather info/alternates (or info/alternate)? It
looks better, sounds better, is more namespace-ecological tab-completes
fine and you don't type it that often anyway. :-)

-- 
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
If you want the holes in your knowledge showing up try teaching
someone.  -- Alan Cox
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Alternate object pool mechanism updates.

2005-08-13 Thread Junio C Hamano
It was a mistake to use GIT_ALTERNATE_OBJECT_DIRECTORIES
environment variable to specify what alternate object pools to
look for missing objects when working with an object database.
It is not a property of the process running the git commands,
but a property of the object database that is partial and needs
other object pools to complete the set of objects it lacks.

This patch allows you to have $GIT_OBJECT_DIRECTORY/info/alt
file whose contents is in exactly the same format as the
environment variable, to let an object database name alternate
object pools it depends on.

Signed-off-by: Junio C Hamano <[EMAIL PROTECTED]>
---

 cache.h  |5 +-
 fsck-cache.c |8 ++-
 sha1_file.c  |  146 --
 3 files changed, 88 insertions(+), 71 deletions(-)

8150a422f79cc461316052b52263289b851d4820
diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -278,9 +278,10 @@ struct checkout {
 extern int checkout_entry(struct cache_entry *ce, struct checkout *state);
 
 extern struct alternate_object_database {
-   char *base;
+   struct alternate_object_database *next;
char *name;
-} *alt_odb;
+   char base[0]; /* more */
+} *alt_odb_list;
 extern void prepare_alt_odb(void);
 
 extern struct packed_git {
diff --git a/fsck-cache.c b/fsck-cache.c
--- a/fsck-cache.c
+++ b/fsck-cache.c
@@ -456,13 +456,13 @@ int main(int argc, char **argv)
fsck_head_link();
fsck_object_dir(get_object_directory());
if (check_full) {
-   int j;
+   struct alternate_object_database *alt;
struct packed_git *p;
prepare_alt_odb();
-   for (j = 0; alt_odb[j].base; j++) {
+   for (alt = alt_odb_list; alt; alt = alt->next) {
char namebuf[PATH_MAX];
-   int namelen = alt_odb[j].name - alt_odb[j].base;
-   memcpy(namebuf, alt_odb[j].base, namelen);
+   int namelen = alt->name - alt->base;
+   memcpy(namebuf, alt->base, namelen);
namebuf[namelen - 1] = 0;
fsck_object_dir(namebuf);
}
diff --git a/sha1_file.c b/sha1_file.c
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -222,84 +222,100 @@ char *sha1_pack_index_name(const unsigne
return base;
 }
 
-struct alternate_object_database *alt_odb;
+struct alternate_object_database *alt_odb_list;
+static struct alternate_object_database **alt_odb_tail;
 
 /*
  * Prepare alternate object database registry.
- * alt_odb points at an array of struct alternate_object_database.
- * This array is terminated with an element that has both its base
- * and name set to NULL.  alt_odb[n] comes from n'th non-empty
- * element from colon separated ALTERNATE_DB_ENVIRONMENT environment
- * variable, and its base points at a statically allocated buffer
- * that contains "/the/directory/corresponding/to/.git/objects/...",
- * while its name points just after the slash at the end of
- * ".git/objects/" in the example above, and has enough space to hold
- * 40-byte hex SHA1, an extra slash for the first level indirection,
- * and the terminating NUL.
- * This function allocates the alt_odb array and all the strings
- * pointed by base fields of the array elements with one xmalloc();
- * the string pool immediately follows the array.
+ *
+ * The variable alt_odb_list points at the list of struct
+ * alternate_object_database.  The elements on this list come from
+ * non-empty elements from colon separated ALTERNATE_DB_ENVIRONMENT
+ * environment variable, and $GIT_OBJECT_DIRECTORY/info/alt file,
+ * whose contents is exactly in the same format as that environment
+ * variable.  Its base points at a statically allocated buffer that
+ * contains "/the/directory/corresponding/to/.git/objects/...", while
+ * its name points just after the slash at the end of ".git/objects/"
+ * in the example above, and has enough space to hold 40-byte hex
+ * SHA1, an extra slash for the first level indirection, and the
+ * terminating NUL.
  */
-void prepare_alt_odb(void)
+static void link_alt_odb_entries(const char *alt, const char *ep)
 {
-   int pass, totlen, i;
const char *cp, *last;
-   char *op = NULL;
-   const char *alt = gitenv(ALTERNATE_DB_ENVIRONMENT) ? : "";
+   struct alternate_object_database *ent;
+
+   last = alt;
+   do {
+   for (cp = last; cp < ep && *cp != ':'; cp++)
+   ;
+   if (last != cp) {
+   /* 43 = 40-byte + 2 '/' + terminating NUL */
+   int pfxlen = cp - last;
+   int entlen = pfxlen + 43;
+
+   ent = xmalloc(sizeof(*ent) + entlen);
+   *alt_odb_tail = ent;
+   alt_odb_tail = &(ent->next);
+   ent->next = NULL;
+
+   memcpy(ent->base, last, p