Ever since the split index feature was introduced [1], refreshing a
split index is prone to a variant of the classic racy git problem.
There are a couple of unrelated tests in the test suite that
occasionally fail when run with 'GIT_TEST_SPLIT_INDEX=yes', but
't1700-split-index.sh', the only test script focusing solely on split
index, has never noticed this issue, because it only cares about how
the index is split under various circumstances and all the different
ways to turn the split index feature on and off.
Add a dedicated test script 't1701-racy-split-index.sh' to exercise
the split index feature in racy situations as well; kind of a
"t0010-racy-git.sh for split index" but with modern style (the tests
do everything in &&-chained list of commands in 'test_expect_...'
blocks, and use 'test_cmp' for more informative output on failure).
The tests cover the following sequences of index splitting, updating,
and racy file modifications, with the last two cases demonstrating the
racy split index problem:
1. Split the index while adding a racily clean file:
echo "cached content" >file
git update-index --split-index --add file
echo "dirty worktree" >file# size stays the same
This case already works properly. Even though the cache entry's
stat data matches with the modifid file in the worktree,
subsequent git commands will notice that the (split) index and
the file have the same mtime, and then will go on to check the
file's content and notice its dirtiness.
2. Add a racily clean file to an already split index:
git update-index --split-index
echo "cached content" >file
git update-index --add file
echo "dirty worktree" >file
This case already works properly. After the second 'git
update-index' writes the newly added file's cache entry to the
new split index, it basically works in the same way as case #1.
3. Split the index when it (i.e. the not yet splitted index)
contains a racily clean cache entry, i.e. an entry whose cached
stat data matches with the corresponding file in the worktree and
the cached mtime matches that of the index:
echo "cached content" >file
git update-index --add file
echo "dirty worktree" >file
# ... wait ...
git update-index --split-index --add other-file
This case already works properly. The shared index is written by
do_write_index(), i.e. the same function that is responsible for
writing "regular" and split indexes as well. This function
cleverly notices the racily clean cache entry, and writes the
entry to the new shared index with smudged stat data, i.e. file
size set to 0. When subsequent git commands read the index, they
will notice that the smudged stat data doesn't match with the
file in the worktree, and then go on to check the file's content
and notice its dirtiness.
4. Update the split index when it contains a racily clean cache
entry:
git update-index --split-index
echo "cached content" >file
git update-index --add file
echo "dirty worktree" >file
# ... wait ...
git update-index --add other-file
This case already works properly. After the second 'git
update-index' the newly added file's cache entry is only stored
in the split index. If a cache entry is present in the split
index (even if it is a replacement of an outdated entry in the
shared index), then it will always be included in the new split
index on subsequent split index updates (until the file is
removed or a new shared index is written), independently from
whether the entry is racily clean or not. When do_write_index()
writes the new split index, it notices the racily clean cache
entry, and smudges its stat date. Subsequent git commands
reading the index will notice the smudged stat data and then go
on to check the file's content and notice its dirtiness.
5. Update the split index when a racily clean cache entry is stored
only in the shared index:
echo "cached content" >file
git update-index --split-index --add file
echo "dirty worktree" >file
# ... wait ...
git update-index --add other-file
This case fails due to the racy split index problem. In the
second 'git update-index' prepare_to_write_split_index() decides,
among other things, which cache entries stored only in the shared
index should be replaced in the new split index. Alas, this
function never looks out for racily clean cache entries, and
since the file's stat data in the worktree hasn't changed since
the shared index was written, the entry won't be replaced in the
new split index. Consequently, do_write_index() doesn't even get
this racily clean cache entry, and can't smudge its stat data.
Subsequent git commands will then see that the