Re: [PATCH v2 1/5] split-index: add tests to demonstrate the racy split index problem

2018-09-28 Thread Junio C Hamano
SZEDER Gábor  writes:

> Junio,
>
> On Thu, Sep 27, 2018 at 02:44:30PM +0200, SZEDER Gábor wrote:
>> diff --git a/t/t1701-racy-split-index.sh b/t/t1701-racy-split-index.sh
>> new file mode 100755
>> index 00..ebde418d7e
>> --- /dev/null
>> +++ b/t/t1701-racy-split-index.sh
>> @@ -0,0 +1,218 @@
>> +#!/bin/sh
>> +
>> +# This test can give false success if your machine is sufficiently
>> +# slow or all trials happened to happen on second boundaries.
>> +
>> +test_description='racy split index'
>> +
>> +. ./test-lib.sh
>> +
>> +test_expect_success 'setup' '
>> +# Only split the index when the test explicitly says so.
>> +sane_unset GIT_TEST_SPLIT_INDEX GIT_FSMONITOR_TEST &&
>
> Please note that this patch adds another use of the environment
> variable GIT_FSMONITOR_TEST, while the topic 'bp/rename-test-env-var'
> is about to rename that variable to GIT_TEST_FSMONITOR.

I think the most sensible thing to do during transition is to set
(or unset) both consistently (and leave a note in t/test-lib.sh near
check_var_migration that t1701 has done so and the dup should be
corrected when transition is over---but that is optional).




Re: [PATCH v2 1/5] split-index: add tests to demonstrate the racy split index problem

2018-09-27 Thread SZEDER Gábor
On Fri, Sep 28, 2018 at 02:48:43AM +0200, SZEDER Gábor wrote:
> Junio,
> 
> On Thu, Sep 27, 2018 at 02:44:30PM +0200, SZEDER Gábor wrote:
> > diff --git a/t/t1701-racy-split-index.sh b/t/t1701-racy-split-index.sh
> > new file mode 100755
> > index 00..ebde418d7e
> > --- /dev/null
> > +++ b/t/t1701-racy-split-index.sh
> > @@ -0,0 +1,218 @@
> > +#!/bin/sh
> > +
> > +# This test can give false success if your machine is sufficiently
> > +# slow or all trials happened to happen on second boundaries.
> > +
> > +test_description='racy split index'
> > +
> > +. ./test-lib.sh
> > +
> > +test_expect_success 'setup' '
> > +   # Only split the index when the test explicitly says so.
> > +   sane_unset GIT_TEST_SPLIT_INDEX GIT_FSMONITOR_TEST &&
> 
> Please note that this patch adds another use of the environment
> variable GIT_FSMONITOR_TEST, while the topic 'bp/rename-test-env-var'
> is about to rename that variable to GIT_TEST_FSMONITOR.

Hang on for a sec.  I unset GIT_FSMONITOR_TEST in this test, because I
saw that 't1700-split-index.sh' unsets it, and I just followed suit.
But come to think of it, t1700 has to unset it, because some of its
tests check the index's SHA-1 checksum, and the FSMN extension would
interfere with that, of course.  However, that's not an issue for
t1701, because none of its tests care about the index's checksum, so
unsetting GIT_FSMONITOR_TEST is actually unnecessary here...  unless
it could have other side effects that I'm not aware of.



Re: [PATCH v2 1/5] split-index: add tests to demonstrate the racy split index problem

2018-09-27 Thread SZEDER Gábor
Junio,

On Thu, Sep 27, 2018 at 02:44:30PM +0200, SZEDER Gábor wrote:
> diff --git a/t/t1701-racy-split-index.sh b/t/t1701-racy-split-index.sh
> new file mode 100755
> index 00..ebde418d7e
> --- /dev/null
> +++ b/t/t1701-racy-split-index.sh
> @@ -0,0 +1,218 @@
> +#!/bin/sh
> +
> +# This test can give false success if your machine is sufficiently
> +# slow or all trials happened to happen on second boundaries.
> +
> +test_description='racy split index'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' '
> + # Only split the index when the test explicitly says so.
> + sane_unset GIT_TEST_SPLIT_INDEX GIT_FSMONITOR_TEST &&

Please note that this patch adds another use of the environment
variable GIT_FSMONITOR_TEST, while the topic 'bp/rename-test-env-var'
is about to rename that variable to GIT_TEST_FSMONITOR.



[PATCH v2 1/5] split-index: add tests to demonstrate the racy split index problem

2018-09-27 Thread SZEDER Gábor
Ever since the split index feature was introduced [1], refreshing a
split index is prone to a variant of the classic racy git problem.
There are a couple of unrelated tests in the test suite that
occasionally fail when run with 'GIT_TEST_SPLIT_INDEX=yes', but
't1700-split-index.sh', the only test script focusing solely on split
index, has never noticed this issue, because it only cares about how
the index is split under various circumstances and all the different
ways to turn the split index feature on and off.

Add a dedicated test script 't1701-racy-split-index.sh' to exercise
the split index feature in racy situations as well; kind of a
"t0010-racy-git.sh for split index" but with modern style (the tests
do everything in &&-chained list of commands in 'test_expect_...'
blocks, and use 'test_cmp' for more informative output on failure).

The tests cover the following sequences of index splitting, updating,
and racy file modifications, with the last two cases demonstrating the
racy split index problem:

  1. Split the index while adding a racily clean file:

   echo "cached content" >file
   git update-index --split-index --add file
   echo "dirty worktree" >file# size stays the same

 This case already works properly.  Even though the cache entry's
 stat data matches with the modifid file in the worktree,
 subsequent git commands will notice that the (split) index and
 the file have the same mtime, and then will go on to check the
 file's content and notice its dirtiness.

  2. Add a racily clean file to an already split index:

   git update-index --split-index
   echo "cached content" >file
   git update-index --add file
   echo "dirty worktree" >file

 This case already works properly.  After the second 'git
 update-index' writes the newly added file's cache entry to the
 new split index, it basically works in the same way as case #1.

  3. Split the index when it (i.e. the not yet splitted index)
 contains a racily clean cache entry, i.e. an entry whose cached
 stat data matches with the corresponding file in the worktree and
 the cached mtime matches that of the index:

   echo "cached content" >file
   git update-index --add file
   echo "dirty worktree" >file
   # ... wait ...
   git update-index --split-index --add other-file

 This case already works properly.  The shared index is written by
 do_write_index(), i.e. the same function that is responsible for
 writing "regular" and split indexes as well.  This function
 cleverly notices the racily clean cache entry, and writes the
 entry to the new shared index with smudged stat data, i.e. file
 size set to 0.  When subsequent git commands read the index, they
 will notice that the smudged stat data doesn't match with the
 file in the worktree, and then go on to check the file's content
 and notice its dirtiness.

  4. Update the split index when it contains a racily clean cache
 entry:

   git update-index --split-index
   echo "cached content" >file
   git update-index --add file
   echo "dirty worktree" >file
   # ... wait ...
   git update-index --add other-file

 This case already works properly.  After the second 'git
 update-index' the newly added file's cache entry is only stored
 in the split index.  If a cache entry is present in the split
 index (even if it is a replacement of an outdated entry in the
 shared index), then it will always be included in the new split
 index on subsequent split index updates (until the file is
 removed or a new shared index is written), independently from
 whether the entry is racily clean or not.  When do_write_index()
 writes the new split index, it notices the racily clean cache
 entry, and smudges its stat date.  Subsequent git commands
 reading the index will notice the smudged stat data and then go
 on to check the file's content and notice its dirtiness.

  5. Update the split index when a racily clean cache entry is stored
 only in the shared index:

   echo "cached content" >file
   git update-index --split-index --add file
   echo "dirty worktree" >file
   # ... wait ...
   git update-index --add other-file

 This case fails due to the racy split index problem.  In the
 second 'git update-index' prepare_to_write_split_index() decides,
 among other things, which cache entries stored only in the shared
 index should be replaced in the new split index.  Alas, this
 function never looks out for racily clean cache entries, and
 since the file's stat data in the worktree hasn't changed since
 the shared index was written, the entry won't be replaced in the
 new split index.  Consequently, do_write_index() doesn't even get
 this racily clean cache entry, and can't smudge its stat data.
 Subsequent git commands will then see that the