In this message, a method to simplify and at the same time make more powerful the git abstraction is presented.
I believe that the enhancements I propose make git adhere even more to its "spirit" and make it more intuitive. The proposal makes it much easier to build an SCM over git, obtaining in particular the following advantages: - Blob and tree objects become symmetric - Commit objects are removed (their data is put inside tree objects) - Commit comments are per-file - A tree in a repository looks like a repository itself, with full version information (now only the one mentioned in the commit object has version information) - File and directory renames are tracked - Renames are tracked regardless of the way they are made (even with cp and rm) - Commit comments can be updated at any time by whoever made the change - Doing the "blame" operation is trivial - Minimizing disk space usage (at the expense of speed) by storing diffs is easily doable The basic idea is that rather than having single blob or tree revisions as the base concept, the abstract base unit is the whole set of modifications, with comments, leading to that state. Of course, tracking that would be extremely space-inefficient, so we instead track the current file contents, plus the public key of the author and the hashes of all parents. This is implemented with the following changes to git: - The commit object is removed - Each tree must have a ".git-commit" file that contains the information previously in the commit object (only for immediate children, thus having a ".git-commit" file in each directory), but with the author public key instead of the comments - Each blob will be hashed as the blob contents plus an header in a canonical format that contains data similar to the data in the ".git-commit" file - When checked out, the blob header is put in a C/C++ comment, a # comment, or if the file format is unknown, in an extended attribute or a separate file An example of a C/C++ file with metadata is the following: // @parent<SHA1_OF_PARENT1> @parent<SHA1_OF_PARENT2> // @author<FINGERPRINT_OF_AUTHOR_PUBLIC_KEY> #include <stdlib.h> int main(int argc, char** argv) { printf("Hello, world!\n"); return 0; } Note that @parent<> and @author<> in checked out files are NOT the same of the ones in the repository but are crafted so that there is a single @parent pointing to the repository file and @author is taken from $HOME/.gitrc - When the file is checked in, the header is parsed and removed. * If there is a single parent, its header is added and the resulting buffer is hashed and compared with the parent's hash. If equal, the file is unchanged and not committed. * Otherwise, the header data is added in a canonical format and the buffer is hashed and committed - A new class of objects is added, that is not named by their hash, but rather by a public key (or fingerprint of it), a timestamp and a name. The object is correct if and only if the contents plus name and timestamp are signed with the private key corresponding to public key in the name. Object names are formatted as "<id>/<name>/<args>" where <url> is an uuid or url that makes the <id>/<name> unique, <name> is the name, and <args> is additional data. File names formatted like "git/c/<sha1>" are interpreted as commit comments for object <sha1>. - For storage or network transmission purposes, a binary diff against the parents can be stored instead of the contents af an object. This will of course require to walk the whole history to rebuild it, but smarter schemes are possible (e.g. "keyframes", "jump diffs", etc.). - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html