This is my preliminary report after evaluating git; there are some TODO items left. It's based on git 1.2.2.

- Frank

1.  Introduction

    Git is a distributed source code management tool. It was originally
    developed by Linus Torvalds, when he could no longer use BitKeeper
    to manage the Linux kernel sources. It is currently maintained by
    Junio C Hamano. It has an active community of users, and has been
    adapted by a few other open source projects. Git was designed to
    be low-level and fast. Higher-level interfaces are available, the
    most popular being Cogito, which offers a GUI. Git is open source,
    and distributed under the Gnu Public License, version 2.

2.  Version used

    Initially, the intention was to use Git version 1.2.4. However,
    while remote repository tests seemed to work with this version,
    local repository tests did not: clone operations on local
    repositories failed to create a usable repository, and subsequent
    operations failed. The same problem was observed with version
    1.2.3. The same problem was reported by someone else on the
    OpenSolaris tools-discuss mailinglists, noting that 1.2.2 did
    work. This was indeed true, and 1.2.2 was used for testing.
    It has to be noted that a remote repository created with 1.2.4
    could not be accessed with 1.2.2, but recreating the repository
    fixed that problem.

3.  Requirements

    This section will look at the requirements as listed in the
    Distributed Source Code Management Requirements, version 1.4.

3.1 E0 - Open source

    Git is open source, and available under the GPL, version 2.

3.2 E1 - Unbiased and disconnected distribution

    Yes, Git works in a distributed fashion, and updates between
    distinct repositories are possible. Synchronization between
    repositories is done via explicit push/pull operations.

3.3 E2 - Networked operation

    Networked operation is supported. Support for ssh connections
    is a builtin feature.

3.4 E3 - Interface stability and completeness

    The metadata storage data format is claimed to be stable, with
    only one incompatible change having occured in the past. The
    incompatibility between versions as noted in section 2 might
    point to problems in this area, but they have not been investigated
    further.

3.5 E4 - Standard operations and transactions

    Rename is supported, but at the metadata level this is a copy
    followed by a delete. History is not preserved.

    Deletion at the file and directory levels consists of removing
    the files at the filesystem level, and then performing a commit.

    Delete file, commit, create new file with the same name, commit,
    is a supported sequence of operations. After the file is re-created
    and committed in the final step, it inherits the history of
    the original file.

    A reverted deletion by user A, followed by a change to the same
    file by user B (from a repository update before the deletion)
    works.

    It does not appear to be possible to reference deleted files,
    except of course when inspecting differences between revisions.

    No equivalency errors were found during testing.

3.6 E5 - Per changeset metadata

    It is possible to attach metadata to a changeset, via the git tagging
    commands.

3.7 C6 - Ease of use

    Git is not hard to install. It required some modifications to
    the Makefile, but they weren't major. One issue is that it
    requires a recent version of GNU diff, with a -L (label) option.
    The diff executable name and flags are hardcoded in the C
    and shellscript source, and had to be changed. Also, the Python
    script assumes /usr/bin/python exists, and should use the Python
    setup mechanism instead.

    Git likes to have the rcs merge(1) command around, but it not
    being there isn't fatal.

    The low-level tools interface is inconsistent at times (long/short
    options, flags, like -n, having a different meaning for different
    commands).

    Git has a seperate command to maintain some repository state for
    a file, git-update-index, which updates the state for a file in
    the repository (before a commit). Its use is sometimes confusing,
    as some commands perform this operation themselves (sometimes
    depending on which flags they were passed), while at other times
    and explicit git-update-index is required. For example, the mv
    commands does do an implicit update, but the add command does not.

3.8 C7 "No dedicated server" operational mode

    Git does not require a dedicated server.

3.9 C8 - Tool community health

    The Git community is active, and the author actively interacts with
    users and developers on the primary Git mailinglist. I estimate that
    the author will be happy to take patches back, although currently,
    Git has a strong connection to the Linux community, which may take
    first place.

3.10 C9 - OpenSolaris community implementation expertise

    At least one Sun engineer in the OpenSolaris community is active in
    the Git community and has worked with it for a while.

3.11 C10 - Interface extensibility

    The following hooks are available and run if they are executables
    present in the hooks subdirectory of the git configuration directory:

    applypatch-msg
    pre-applypatch
    post-applypatch

    pre-commit
    post-commit

    update
    post-update

3.12 C11 - Transactional operations and corruption recovery

     I was unable to test this extensively, but the semantics do seem
     to be generally well-defined at the lowest repository level. There
     is an 'fsck' command to recover from a corrupted repository.

3.13 C12 - Content generality

     Binary files are supported.

3.14 O13 - Partial trees

     Partial trees are not supported

3.15 O14 - Per-file histories

     Per-file histories are not supported, but the Git core commands will
     extract the revisions that affected this file when asked for the
     history of a file.

4.  Evaluation

4.1  Test hardware used

     psrinfo -v output:

        Status of virtual processor 0 as of: 03/31/2006 16:06:17
          on-line since 02/23/2006 11:23:45.
          The sparcv9 processor operates at 1600 MHz,
                and has a sparcv9 floating point processor.
        Status of virtual processor 1 as of: 03/31/2006 16:06:17
          on-line since 02/23/2006 11:23:42.
          The sparcv9 processor operates at 1600 MHz,
                and has a sparcv9 floating point processor.

     uname -a output:

        SunOS klomp 5.11 snv_31 sun4u sparc SUNW,Sun-Blade-2500

     Tests were run on a local 122G ZFS filesystem.

4.2  Test results

4.2.1 Speed

      First commit (git add + git commit) of the OpenSolaris source tree:
      1m40s

      Clone of a remote repository created by the above commands, from
      Menlo Park, CA, US to Amersfoort, Netherlands over SWAN: 6m35s
      This matches the linespeed usually achieved over this connection,
      given that the remote clone operation packs and compresses the
      repository when transferring it.

      Local clone of the same repository: 2m35s

      Local commit of one file in the repository: 9s

4.2.2 Conflict resolution

      A test harness was used to test the following conflict scenarios:

         * Two users each have a clone of a central repository. Both
           make a different change to the same line of the same file.

           Git correctly signaled this conflict, and directed the user
           to resolve this conflict by hand.

         * Three users each have a clone of a central repository. Both
           move the same files to different locations. A 3rd user renames
           one of the files in its original directory. All then do a commit
           and a push.

           Git correctly noticed the rename conflicts and provided a message
           with the full renamed paths, prompting the user to resolve the
           conflict. For the renamed files, Git appears to pull in the
           renamed files from the central repository, and undoes the rename
           in the local repository, after which the user has to resolve
           the conflict. The user isn't explicitly informed about this
           behavior.

      A problem with conflict resolution lies with the commit command.
      Normally, commit will not deal with added/deleted files that have
      not explicitly been marked as such. However, the -a option should
      deal with this, and is advertised as:

      "Update all paths in the index file. This flag notices files that have
       been modified and deleted, but new files you have not told git about
       are not affected."

      Several tutorials tell you to routinely use the -a option (they
      even seem to suggest always using it). However, commit -a will
      throw away any conflict information and will happily do a
      commit even there are unresolved conflicts, which is definitely
      not the desired result.


4.3  Source code

     The Git source code consists of C, Perl and Python source:

        108 .c files (34277 lines), 23 .h files (1411 lines)
        10 .perl files (4373 lines)
        38 .sh files (5242 lines)
        2 .py files (1219 lines)

     The coding style in the C files is fairly consistent, but comments
     are extremely sparse, so it can be hard to tell what's going on,
     especially if some functionality is also present in shell/perl or
     python files. This makes it harder for 3rd party contributors,
     and is inconsistent.
 

6.  Conclusions

    The original goal for Git was to be fast. It certainly achieves that
    goal, as it seems to be the fastest SCM I have come across.

    It also has an active an enthusiastic community, which gives it
    momentum.

    The downsides are:

        * Needing to go two versions back to find a version that
          worked for some very basic operations (e.g. creating
          and cloning a repository) is not good.
        * The source code is inconsistent in places (language it's
          written in), and needs much documentation. It also
          has a lot of hardcoded names in it (diff command and
          flags, hooknames).
        * Documentation is available for all commands, but it can
          be sparse.
        * Commit -a should not throw away conflict information.
        * The update-index command seems counter-intuitive and
          inconsistently used amongst the git core commands.
        * The flags are sometimes inconsistent from command to
          command.

7. Todo:

        * Look at filesystem usage more.
        * Look at the 'cogito' GUI.
_______________________________________________
tools-discuss mailing list
[email protected]

Reply via email to