This is my preliminary report after evaluating git; there are some TODO
items left. It's based on git 1.2.2.
- Frank
1. Introduction
Git is a distributed source code management tool. It was originally
developed by Linus Torvalds, when he could no longer use BitKeeper
to manage the Linux kernel sources. It is currently maintained by
Junio C Hamano. It has an active community of users, and has been
adapted by a few other open source projects. Git was designed to
be low-level and fast. Higher-level interfaces are available, the
most popular being Cogito, which offers a GUI. Git is open source,
and distributed under the Gnu Public License, version 2.
2. Version used
Initially, the intention was to use Git version 1.2.4. However,
while remote repository tests seemed to work with this version,
local repository tests did not: clone operations on local
repositories failed to create a usable repository, and subsequent
operations failed. The same problem was observed with version
1.2.3. The same problem was reported by someone else on the
OpenSolaris tools-discuss mailinglists, noting that 1.2.2 did
work. This was indeed true, and 1.2.2 was used for testing.
It has to be noted that a remote repository created with 1.2.4
could not be accessed with 1.2.2, but recreating the repository
fixed that problem.
3. Requirements
This section will look at the requirements as listed in the
Distributed Source Code Management Requirements, version 1.4.
3.1 E0 - Open source
Git is open source, and available under the GPL, version 2.
3.2 E1 - Unbiased and disconnected distribution
Yes, Git works in a distributed fashion, and updates between
distinct repositories are possible. Synchronization between
repositories is done via explicit push/pull operations.
3.3 E2 - Networked operation
Networked operation is supported. Support for ssh connections
is a builtin feature.
3.4 E3 - Interface stability and completeness
The metadata storage data format is claimed to be stable, with
only one incompatible change having occured in the past. The
incompatibility between versions as noted in section 2 might
point to problems in this area, but they have not been investigated
further.
3.5 E4 - Standard operations and transactions
Rename is supported, but at the metadata level this is a copy
followed by a delete. History is not preserved.
Deletion at the file and directory levels consists of removing
the files at the filesystem level, and then performing a commit.
Delete file, commit, create new file with the same name, commit,
is a supported sequence of operations. After the file is re-created
and committed in the final step, it inherits the history of
the original file.
A reverted deletion by user A, followed by a change to the same
file by user B (from a repository update before the deletion)
works.
It does not appear to be possible to reference deleted files,
except of course when inspecting differences between revisions.
No equivalency errors were found during testing.
3.6 E5 - Per changeset metadata
It is possible to attach metadata to a changeset, via the git tagging
commands.
3.7 C6 - Ease of use
Git is not hard to install. It required some modifications to
the Makefile, but they weren't major. One issue is that it
requires a recent version of GNU diff, with a -L (label) option.
The diff executable name and flags are hardcoded in the C
and shellscript source, and had to be changed. Also, the Python
script assumes /usr/bin/python exists, and should use the Python
setup mechanism instead.
Git likes to have the rcs merge(1) command around, but it not
being there isn't fatal.
The low-level tools interface is inconsistent at times (long/short
options, flags, like -n, having a different meaning for different
commands).
Git has a seperate command to maintain some repository state for
a file, git-update-index, which updates the state for a file in
the repository (before a commit). Its use is sometimes confusing,
as some commands perform this operation themselves (sometimes
depending on which flags they were passed), while at other times
and explicit git-update-index is required. For example, the mv
commands does do an implicit update, but the add command does not.
3.8 C7 "No dedicated server" operational mode
Git does not require a dedicated server.
3.9 C8 - Tool community health
The Git community is active, and the author actively interacts with
users and developers on the primary Git mailinglist. I estimate that
the author will be happy to take patches back, although currently,
Git has a strong connection to the Linux community, which may take
first place.
3.10 C9 - OpenSolaris community implementation expertise
At least one Sun engineer in the OpenSolaris community is active in
the Git community and has worked with it for a while.
3.11 C10 - Interface extensibility
The following hooks are available and run if they are executables
present in the hooks subdirectory of the git configuration directory:
applypatch-msg
pre-applypatch
post-applypatch
pre-commit
post-commit
update
post-update
3.12 C11 - Transactional operations and corruption recovery
I was unable to test this extensively, but the semantics do seem
to be generally well-defined at the lowest repository level. There
is an 'fsck' command to recover from a corrupted repository.
3.13 C12 - Content generality
Binary files are supported.
3.14 O13 - Partial trees
Partial trees are not supported
3.15 O14 - Per-file histories
Per-file histories are not supported, but the Git core commands will
extract the revisions that affected this file when asked for the
history of a file.
4. Evaluation
4.1 Test hardware used
psrinfo -v output:
Status of virtual processor 0 as of: 03/31/2006 16:06:17
on-line since 02/23/2006 11:23:45.
The sparcv9 processor operates at 1600 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 03/31/2006 16:06:17
on-line since 02/23/2006 11:23:42.
The sparcv9 processor operates at 1600 MHz,
and has a sparcv9 floating point processor.
uname -a output:
SunOS klomp 5.11 snv_31 sun4u sparc SUNW,Sun-Blade-2500
Tests were run on a local 122G ZFS filesystem.
4.2 Test results
4.2.1 Speed
First commit (git add + git commit) of the OpenSolaris source tree:
1m40s
Clone of a remote repository created by the above commands, from
Menlo Park, CA, US to Amersfoort, Netherlands over SWAN: 6m35s
This matches the linespeed usually achieved over this connection,
given that the remote clone operation packs and compresses the
repository when transferring it.
Local clone of the same repository: 2m35s
Local commit of one file in the repository: 9s
4.2.2 Conflict resolution
A test harness was used to test the following conflict scenarios:
* Two users each have a clone of a central repository. Both
make a different change to the same line of the same file.
Git correctly signaled this conflict, and directed the user
to resolve this conflict by hand.
* Three users each have a clone of a central repository. Both
move the same files to different locations. A 3rd user renames
one of the files in its original directory. All then do a commit
and a push.
Git correctly noticed the rename conflicts and provided a message
with the full renamed paths, prompting the user to resolve the
conflict. For the renamed files, Git appears to pull in the
renamed files from the central repository, and undoes the rename
in the local repository, after which the user has to resolve
the conflict. The user isn't explicitly informed about this
behavior.
A problem with conflict resolution lies with the commit command.
Normally, commit will not deal with added/deleted files that have
not explicitly been marked as such. However, the -a option should
deal with this, and is advertised as:
"Update all paths in the index file. This flag notices files that have
been modified and deleted, but new files you have not told git about
are not affected."
Several tutorials tell you to routinely use the -a option (they
even seem to suggest always using it). However, commit -a will
throw away any conflict information and will happily do a
commit even there are unresolved conflicts, which is definitely
not the desired result.
4.3 Source code
The Git source code consists of C, Perl and Python source:
108 .c files (34277 lines), 23 .h files (1411 lines)
10 .perl files (4373 lines)
38 .sh files (5242 lines)
2 .py files (1219 lines)
The coding style in the C files is fairly consistent, but comments
are extremely sparse, so it can be hard to tell what's going on,
especially if some functionality is also present in shell/perl or
python files. This makes it harder for 3rd party contributors,
and is inconsistent.
6. Conclusions
The original goal for Git was to be fast. It certainly achieves that
goal, as it seems to be the fastest SCM I have come across.
It also has an active an enthusiastic community, which gives it
momentum.
The downsides are:
* Needing to go two versions back to find a version that
worked for some very basic operations (e.g. creating
and cloning a repository) is not good.
* The source code is inconsistent in places (language it's
written in), and needs much documentation. It also
has a lot of hardcoded names in it (diff command and
flags, hooknames).
* Documentation is available for all commands, but it can
be sparse.
* Commit -a should not throw away conflict information.
* The update-index command seems counter-intuitive and
inconsistently used amongst the git core commands.
* The flags are sometimes inconsistent from command to
command.
7. Todo:
* Look at filesystem usage more.
* Look at the 'cogito' GUI.
_______________________________________________
tools-discuss mailing list
[email protected]