Storing refs in the odb (was: Re: [PATCH 00/17] Remove assumptions about refname lifetimes)

2013-05-20 Thread Johan Herland
On Mon, May 20, 2013 at 2:15 PM, Michael Haggerty mhag...@alum.mit.edu wrote:
 This is a very interesting idea.  It's turtles all the way down.

:)

 On 05/20/2013 12:28 PM, Johan Herland wrote:
 For server-class installations we need ref storage that can be read
 (and updated?) atomically, and the current system of loose + packed
 files won't work since reading (and updating) more than a single file
 is not an atomic operation. Trivially, one could resolve this by
 dropping loose refs, and always using a single packed-refs file, but
 that would make it prohibitively expensive to update refs (the entire
 packed-refs file must be rewritten for every update).

 Correct, or the packed-refs file would have to be updated in place
 using some database-style approach for locking/transactions/whatever.

 Now, observe that we don't have these race conditions in the object
 database, because it is an add-only immutable data store.

 Except for prune, of course, which can cause race conditions WRT to writers.

Yes, but that is a different race, in need of a different solution.
E.g. that race is concerned with pruning unreachable objects that are
about to become reachable by a concurrent operation, which is AFAICS
independent from the ref update race that we're discussing here.

 What if we stored the refs as a tree object in the object database,
 referenced by a single (loose) ref? There would be a _single_ (albeit
 highly contentious) file outside the object database that represent
 the current state of the refs, but hopefully we can guarantee
 atomicity when reading (and updating?) that one file. Transactions can
 be done by:
  1. Recording the tree id holding the refs before starting manipulation.
  2. Creating a new tree object holding the manipulated state.
  3. Re-checking the tree id before replacing the loose ref. If
 unchanged: commit, else: rollback/error out.

 There are two closely related possibilities and I'm not sure which one
 you mean:

 * Effectively treat all of the refs as loose refs, but stored not in the
 filesystem but rather in a hierarchical tree structure in the object
 database.  E.g., all of the refs directly under refs/heads would be in
 one tree object, those in refs/remotes/foo in a second, those for
 refs/remotes/bar in another etc. and all of them linked up together in a
 tree object representing refs.

 * Effectively treat all of the refs as packed refs, but store the single
 packed-refs file as a single object in the object database.

 (The first alternative sounds more practical to me.  I also guess that's
 what you mean, since down below you say that each change would require
 producing a few objects.)

The first alternative is what I had in mind.

Initially I thought to record it as if one were to record a new tree
using .git/refs as the root of your worktree (having exploded all
packed-refs into loose refs). I.e. you would have heads, tags,
remotes as subtrees of reference tree, and then e.g. in the
heads subtree, there would be an entry named master pointing to a
_blob_, and the contents of that blob would be the commit id of the
current tip of the master branch.

Obviously the next optimization would be to drop the master - blob
- commit indirection, and use master - commit instead, i.e. the
master tree entry corresponds directly to the commit to which it
points (symrefs would naturally be recorded as symlinks). This would
automatically provide reachability for all refs, but as you correctly
observe:

 Of course in either case we couldn't use a tree object directly, because
 these new reference tree objects would refer not only to blobs and
 other trees but also to commits and tags.

Indeed. I don't know if the best solution would be to actually _allow_
that (which would complicate the object parsing code somewhat; a tree
entry pointing to a commit is usually interpreted as a submodule, but
that is not what we'd want for the ref tree, and a tree entry pointing
at a tag has AFAIK not yet been done), or whether it means we need to
come up with a different kind of structure.

 [I know this is not what you are suggesting, but I am reminded of
 Subversion, which stores trunk, branches, and tags in the same tree
 space as the contents of the working trees.  A Subversion commit
 references a gigantic tree encompassing all branches of development and
 all files on all of those branches (with cheap copies to reduce the
 redundancy):

 /
 /trunk/
 /trunk/Makefile
 /trunk/src/
 /trunk/src/foo.c
 /branches/
 /branches/next/
 /branches/next/Makefile
 /branches/next/src/
 /branches/next/src/foo.c
 /branches/pu/
 /branches/pu/Makefile
 /branches/pu/src/
 /branches/pu/src/foo.c
 /tags/
 /tags/v1.8.2/
 /tags/v1.8.2/Makefile
 /tags/v1.8.2/src/
 /tags/v1.8.2/src/foo.c
 etc...

 A Subversion commit thus describes the state of *every* branch and tag
 at that moment in time.  The model is conceptually very simple (in 

Re: Storing refs in the odb

2013-05-20 Thread Junio C Hamano
Johan Herland jo...@herland.net writes:

 Of course in either case we couldn't use a tree object directly, because
 these new reference tree objects would refer not only to blobs and
 other trees but also to commits and tags.

 Indeed. I don't know if the best solution would be to actually _allow_
 that (which would complicate the object parsing code somewhat; a tree
 entry pointing to a commit is usually interpreted as a submodule, but
 that is not what we'd want for the ref tree, and a tree entry pointing
 at a tag has AFAIK not yet been done), or whether it means we need to
 come up with a different kind of structure.

You can disallow that only by giving up on being able to express
Linus's kernel repository, which has an oddball v2.6.11-tree tag.

I do not think that that particular tag in the particular repository
is too big a show-stopper; if it is only Linus, we can ask him to
drop that tag (he has v2.6.11 tag object that points at the tree, so
the users do not lose anything) and be done with it.

But if there are other repositories that tag trees in a similar way,
that would be a real regression.  We cannot just go ask people to
change their workflow that depended on using refs that directly
point at trees overnight.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing refs in the odb

2013-05-20 Thread Johan Herland
On Mon, May 20, 2013 at 7:21 PM, Junio C Hamano gits...@pobox.com wrote:
 Johan Herland jo...@herland.net writes:

 Of course in either case we couldn't use a tree object directly, because
 these new reference tree objects would refer not only to blobs and
 other trees but also to commits and tags.

 Indeed. I don't know if the best solution would be to actually _allow_
 that (which would complicate the object parsing code somewhat; a tree
 entry pointing to a commit is usually interpreted as a submodule, but
 that is not what we'd want for the ref tree, and a tree entry pointing
 at a tag has AFAIK not yet been done), or whether it means we need to
 come up with a different kind of structure.

 You can disallow that only by giving up on being able to express
 Linus's kernel repository, which has an oddball v2.6.11-tree tag.

 I do not think that that particular tag in the particular repository
 is too big a show-stopper; if it is only Linus, we can ask him to
 drop that tag (he has v2.6.11 tag object that points at the tree, so
 the users do not lose anything) and be done with it.

 But if there are other repositories that tag trees in a similar way,
 that would be a real regression.  We cannot just go ask people to
 change their workflow that depended on using refs that directly
 point at trees overnight.

I wasn't considering disallowing _anything_, rather open up to the
idea that a tree object might refer to tag objects as well as
commits/trees/blobs. E.g. in my suggested-but-pretty-much-retracted
scheme, I was considering whether the tree entry at the virtual path
refs/tags/v1.0 should look like this:

  100644 blob 123456... v1.0

where the blob at 123456... contains the object id of the v1.0 tag
object, or whether we should allow the crazyness that is:

  ?? tag 987654... v1.0

Just a thought experiment...

...Johan

-- 
Johan Herland, jo...@herland.net
www.herland.net
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing refs in the odb

2013-05-20 Thread Junio C Hamano
Johan Herland jo...@herland.net writes:

 I wasn't considering disallowing _anything_, rather open up to the
 idea that a tree object might refer to tag objects as well as
 commits/trees/blobs. E.g. in my suggested-but-pretty-much-retracted
 scheme, I was considering whether the tree entry at the virtual path
 refs/tags/v1.0 should look like this:

   100644 blob 123456... v1.0

 where the blob at 123456... contains the object id of the v1.0 tag
 object, or whether we should allow the crazyness that is:

   ?? tag 987654... v1.0

 Just a thought experiment...

I was reacting to this part of your earlier message:

 Of course in either case we couldn't use a tree object directly, because
 these new reference tree objects would refer not only to blobs and
 other trees but also to commits and tags.

 Indeed. I don't know if the best solution would be to actually _allow_
 that (which would complicate the object parsing code somewhat; a tree

You cannot disambiguate, with the thought-experiment in your message
I am responding to, between these two:

?? tree 987654... v2.6.11-tree
?? tree 987654... sub

where the former is a light-weight tag for that tree, while the
latter is merely a subhierarchy in refs/sub/hier/archy, but if you
disallow v2.6.11-tree, and if you know this kind of tree is only to
express the ref hierarchy, then everything is unambiguous (a commit
is not a submodule but is a ref that points at a commit, a blob is a
ref that points at a blob like refs/tags/junio-gpg-pub, and tag is a
ref that points at the tag).

So it was workable alternative implementation of refs (I am not
saying it is an improvement, with the atomicity and performance
implications we already discussed), if we did not have to worry
about a light-weight tag that directly point at a tree.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing refs in the odb

2013-05-20 Thread Johan Herland
On Mon, May 20, 2013 at 8:28 PM, Junio C Hamano gits...@pobox.com wrote:
 Johan Herland jo...@herland.net writes:

 I wasn't considering disallowing _anything_, rather open up to the
 idea that a tree object might refer to tag objects as well as
 commits/trees/blobs. E.g. in my suggested-but-pretty-much-retracted
 scheme, I was considering whether the tree entry at the virtual path
 refs/tags/v1.0 should look like this:

   100644 blob 123456... v1.0

 where the blob at 123456... contains the object id of the v1.0 tag
 object, or whether we should allow the crazyness that is:

   ?? tag 987654... v1.0

 Just a thought experiment...

 I was reacting to this part of your earlier message:

 Of course in either case we couldn't use a tree object directly, because
 these new reference tree objects would refer not only to blobs and
 other trees but also to commits and tags.

 Indeed. I don't know if the best solution would be to actually _allow_
 that (which would complicate the object parsing code somewhat; a tree

 You cannot disambiguate, with the thought-experiment in your message
 I am responding to, between these two:

 ?? tree 987654... v2.6.11-tree
 ?? tree 987654... sub

 where the former is a light-weight tag for that tree, while the
 latter is merely a subhierarchy in refs/sub/hier/archy, but if you
 disallow v2.6.11-tree, and if you know this kind of tree is only to
 express the ref hierarchy, then everything is unambiguous (a commit
 is not a submodule but is a ref that points at a commit, a blob is a
 ref that points at a blob like refs/tags/junio-gpg-pub, and tag is a
 ref that points at the tag).

 So it was workable alternative implementation of refs (I am not
 saying it is an improvement, with the atomicity and performance
 implications we already discussed), if we did not have to worry
 about a light-weight tag that directly point at a tree.

True, unless we were to abuse the mode bits to differentiate between
regular-subtree and ref-to-tree cases...

...Johan

-- 
Johan Herland, jo...@herland.net
www.herland.net
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html