Storing refs in the odb (was: Re: [PATCH 00/17] Remove assumptions about refname lifetimes)
On Mon, May 20, 2013 at 2:15 PM, Michael Haggerty mhag...@alum.mit.edu wrote: This is a very interesting idea. It's turtles all the way down. :) On 05/20/2013 12:28 PM, Johan Herland wrote: For server-class installations we need ref storage that can be read (and updated?) atomically, and the current system of loose + packed files won't work since reading (and updating) more than a single file is not an atomic operation. Trivially, one could resolve this by dropping loose refs, and always using a single packed-refs file, but that would make it prohibitively expensive to update refs (the entire packed-refs file must be rewritten for every update). Correct, or the packed-refs file would have to be updated in place using some database-style approach for locking/transactions/whatever. Now, observe that we don't have these race conditions in the object database, because it is an add-only immutable data store. Except for prune, of course, which can cause race conditions WRT to writers. Yes, but that is a different race, in need of a different solution. E.g. that race is concerned with pruning unreachable objects that are about to become reachable by a concurrent operation, which is AFAICS independent from the ref update race that we're discussing here. What if we stored the refs as a tree object in the object database, referenced by a single (loose) ref? There would be a _single_ (albeit highly contentious) file outside the object database that represent the current state of the refs, but hopefully we can guarantee atomicity when reading (and updating?) that one file. Transactions can be done by: 1. Recording the tree id holding the refs before starting manipulation. 2. Creating a new tree object holding the manipulated state. 3. Re-checking the tree id before replacing the loose ref. If unchanged: commit, else: rollback/error out. There are two closely related possibilities and I'm not sure which one you mean: * Effectively treat all of the refs as loose refs, but stored not in the filesystem but rather in a hierarchical tree structure in the object database. E.g., all of the refs directly under refs/heads would be in one tree object, those in refs/remotes/foo in a second, those for refs/remotes/bar in another etc. and all of them linked up together in a tree object representing refs. * Effectively treat all of the refs as packed refs, but store the single packed-refs file as a single object in the object database. (The first alternative sounds more practical to me. I also guess that's what you mean, since down below you say that each change would require producing a few objects.) The first alternative is what I had in mind. Initially I thought to record it as if one were to record a new tree using .git/refs as the root of your worktree (having exploded all packed-refs into loose refs). I.e. you would have heads, tags, remotes as subtrees of reference tree, and then e.g. in the heads subtree, there would be an entry named master pointing to a _blob_, and the contents of that blob would be the commit id of the current tip of the master branch. Obviously the next optimization would be to drop the master - blob - commit indirection, and use master - commit instead, i.e. the master tree entry corresponds directly to the commit to which it points (symrefs would naturally be recorded as symlinks). This would automatically provide reachability for all refs, but as you correctly observe: Of course in either case we couldn't use a tree object directly, because these new reference tree objects would refer not only to blobs and other trees but also to commits and tags. Indeed. I don't know if the best solution would be to actually _allow_ that (which would complicate the object parsing code somewhat; a tree entry pointing to a commit is usually interpreted as a submodule, but that is not what we'd want for the ref tree, and a tree entry pointing at a tag has AFAIK not yet been done), or whether it means we need to come up with a different kind of structure. [I know this is not what you are suggesting, but I am reminded of Subversion, which stores trunk, branches, and tags in the same tree space as the contents of the working trees. A Subversion commit references a gigantic tree encompassing all branches of development and all files on all of those branches (with cheap copies to reduce the redundancy): / /trunk/ /trunk/Makefile /trunk/src/ /trunk/src/foo.c /branches/ /branches/next/ /branches/next/Makefile /branches/next/src/ /branches/next/src/foo.c /branches/pu/ /branches/pu/Makefile /branches/pu/src/ /branches/pu/src/foo.c /tags/ /tags/v1.8.2/ /tags/v1.8.2/Makefile /tags/v1.8.2/src/ /tags/v1.8.2/src/foo.c etc... A Subversion commit thus describes the state of *every* branch and tag at that moment in time. The model is conceptually very simple (in
Re: Storing refs in the odb
Johan Herland jo...@herland.net writes: Of course in either case we couldn't use a tree object directly, because these new reference tree objects would refer not only to blobs and other trees but also to commits and tags. Indeed. I don't know if the best solution would be to actually _allow_ that (which would complicate the object parsing code somewhat; a tree entry pointing to a commit is usually interpreted as a submodule, but that is not what we'd want for the ref tree, and a tree entry pointing at a tag has AFAIK not yet been done), or whether it means we need to come up with a different kind of structure. You can disallow that only by giving up on being able to express Linus's kernel repository, which has an oddball v2.6.11-tree tag. I do not think that that particular tag in the particular repository is too big a show-stopper; if it is only Linus, we can ask him to drop that tag (he has v2.6.11 tag object that points at the tree, so the users do not lose anything) and be done with it. But if there are other repositories that tag trees in a similar way, that would be a real regression. We cannot just go ask people to change their workflow that depended on using refs that directly point at trees overnight. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing refs in the odb
On Mon, May 20, 2013 at 7:21 PM, Junio C Hamano gits...@pobox.com wrote: Johan Herland jo...@herland.net writes: Of course in either case we couldn't use a tree object directly, because these new reference tree objects would refer not only to blobs and other trees but also to commits and tags. Indeed. I don't know if the best solution would be to actually _allow_ that (which would complicate the object parsing code somewhat; a tree entry pointing to a commit is usually interpreted as a submodule, but that is not what we'd want for the ref tree, and a tree entry pointing at a tag has AFAIK not yet been done), or whether it means we need to come up with a different kind of structure. You can disallow that only by giving up on being able to express Linus's kernel repository, which has an oddball v2.6.11-tree tag. I do not think that that particular tag in the particular repository is too big a show-stopper; if it is only Linus, we can ask him to drop that tag (he has v2.6.11 tag object that points at the tree, so the users do not lose anything) and be done with it. But if there are other repositories that tag trees in a similar way, that would be a real regression. We cannot just go ask people to change their workflow that depended on using refs that directly point at trees overnight. I wasn't considering disallowing _anything_, rather open up to the idea that a tree object might refer to tag objects as well as commits/trees/blobs. E.g. in my suggested-but-pretty-much-retracted scheme, I was considering whether the tree entry at the virtual path refs/tags/v1.0 should look like this: 100644 blob 123456... v1.0 where the blob at 123456... contains the object id of the v1.0 tag object, or whether we should allow the crazyness that is: ?? tag 987654... v1.0 Just a thought experiment... ...Johan -- Johan Herland, jo...@herland.net www.herland.net -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing refs in the odb
Johan Herland jo...@herland.net writes: I wasn't considering disallowing _anything_, rather open up to the idea that a tree object might refer to tag objects as well as commits/trees/blobs. E.g. in my suggested-but-pretty-much-retracted scheme, I was considering whether the tree entry at the virtual path refs/tags/v1.0 should look like this: 100644 blob 123456... v1.0 where the blob at 123456... contains the object id of the v1.0 tag object, or whether we should allow the crazyness that is: ?? tag 987654... v1.0 Just a thought experiment... I was reacting to this part of your earlier message: Of course in either case we couldn't use a tree object directly, because these new reference tree objects would refer not only to blobs and other trees but also to commits and tags. Indeed. I don't know if the best solution would be to actually _allow_ that (which would complicate the object parsing code somewhat; a tree You cannot disambiguate, with the thought-experiment in your message I am responding to, between these two: ?? tree 987654... v2.6.11-tree ?? tree 987654... sub where the former is a light-weight tag for that tree, while the latter is merely a subhierarchy in refs/sub/hier/archy, but if you disallow v2.6.11-tree, and if you know this kind of tree is only to express the ref hierarchy, then everything is unambiguous (a commit is not a submodule but is a ref that points at a commit, a blob is a ref that points at a blob like refs/tags/junio-gpg-pub, and tag is a ref that points at the tag). So it was workable alternative implementation of refs (I am not saying it is an improvement, with the atomicity and performance implications we already discussed), if we did not have to worry about a light-weight tag that directly point at a tree. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing refs in the odb
On Mon, May 20, 2013 at 8:28 PM, Junio C Hamano gits...@pobox.com wrote: Johan Herland jo...@herland.net writes: I wasn't considering disallowing _anything_, rather open up to the idea that a tree object might refer to tag objects as well as commits/trees/blobs. E.g. in my suggested-but-pretty-much-retracted scheme, I was considering whether the tree entry at the virtual path refs/tags/v1.0 should look like this: 100644 blob 123456... v1.0 where the blob at 123456... contains the object id of the v1.0 tag object, or whether we should allow the crazyness that is: ?? tag 987654... v1.0 Just a thought experiment... I was reacting to this part of your earlier message: Of course in either case we couldn't use a tree object directly, because these new reference tree objects would refer not only to blobs and other trees but also to commits and tags. Indeed. I don't know if the best solution would be to actually _allow_ that (which would complicate the object parsing code somewhat; a tree You cannot disambiguate, with the thought-experiment in your message I am responding to, between these two: ?? tree 987654... v2.6.11-tree ?? tree 987654... sub where the former is a light-weight tag for that tree, while the latter is merely a subhierarchy in refs/sub/hier/archy, but if you disallow v2.6.11-tree, and if you know this kind of tree is only to express the ref hierarchy, then everything is unambiguous (a commit is not a submodule but is a ref that points at a commit, a blob is a ref that points at a blob like refs/tags/junio-gpg-pub, and tag is a ref that points at the tag). So it was workable alternative implementation of refs (I am not saying it is an improvement, with the atomicity and performance implications we already discussed), if we did not have to worry about a light-weight tag that directly point at a tree. True, unless we were to abuse the mode bits to differentiate between regular-subtree and ref-to-tree cases... ...Johan -- Johan Herland, jo...@herland.net www.herland.net -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html