Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code

2012-08-09 Thread Thomas Gummerer
On 08/08, Junio C Hamano wrote:
 Junio C Hamano gits...@pobox.com writes:
 
  So whether done with sleep or test-chmtime, avoiding a racily
  clean situation sounds like sweeping a bug in the v5 code in racy
  situation under the rug to me (unless I am misunderstanding what
  you are doing with this change and in your explanation, or the test
  was checking a wrong thing, that is).
 
  Even more confused
 
 OK, after staring this test for a long time, and going back to
 3d1f148 (refresh_index: do not show unmerged path that is outside
 pathspec, 2012-02-17), I give up.
 
 Let me ask the same question in a more direct way.  Which part of
 this test break with your series?
 
 test_expect_success 'git add --refresh with pathspec' '
 git reset --hard 
 echo foo  echo bar  echo baz 
 git add foo bar baz  H=$(git rev-parse :foo)  git rm -f 
 foo 
 echo 100644 $H 3 foo | git update-index --index-info 
   # sleep 1  in the update here ...
 test-chmtime -60 bar baz 
 expect 
 git add --refresh bar actual 
 test_cmp expect actual 
 
 git diff-files --name-only actual 
 ! grep bar actual
 grep baz actual
 '
 
 We prepare an index with bunch of paths, we make foo unmerged, we
 smudge bar and baz stat-dirty, so that diff-files would report
 them, even though their contents match what is recorded in the
 index.

After getting confused a bit myself, I now think here is the problem.
The v5 code smudges baz when doing git add --refresh bar.  Therefore
baz isn't considered stat-dirty by the code, but a racily smudged entry
and therefore its content gets checked, thus not showing up in
git diff-files.  The mtime doesn't get checked anymore as it is used
as smudge marker and thus 0.  Adding sleep just avoids smudging the
entry.

The alternative would be to use the size or the crc as smudge marker
but I don't think they are good canidates, as they can still be used
by the reader to avoid checking the filesystem.

Another alternative would be to introduce a CE_SMUDGED flag as it was
suggested by Thomas on irc IIRC, but we chose to use the mtime as
smudge marker instead.

 Then we say git add --refresh bar.  As far as I know, the output
 from git add --refresh pathspec is limited to foo: needs merge
 if and only if foo is covered by pathspec and foo is unmerged.
 
   Side note: If --verbose is given to the same command, we
   also give Unstaged changes after refreshing the index:
   followed by M foo or U foo if foo does not match the
   index but not unmerged, or if foo is unmerged, again if
   and only if foo is covered by pathspec.  But that is not
   how we invoke git add --refresh in this test.
 
 So if you are getting a test failure from the test_cmp, wouldn't it
 mean that your series broke what 3d1f148 did (namely, make sure we
 report only on paths that are covered by pathspec, in this case
 bar), as the contents of bar in the working tree matches what is
 recorded in the index?
 
 If the failure you are seeing is that bar appears in the output of
 git diff-files --name-only, it means that diff-files noticed
 that bar is stat-dirty after git add --refresh bar.  Wouldn't it
 mean that the series broke git add --refresh bar in such a way
 that it does not to refresh what it was told to refresh?
 
 Another test that could fail after the point you added sleep 1 is
 that the output from git diff-files --name-only fails to list
 baz in its output, but with test-chmtime -60 bar baz, we made
 sure that bar and baz are stat-dirty, and we only refreshed
 bar and not baz.  If that is the case, then would it mean that
 the series broke git add --refresh bar in such a way that it
 refreshes something other than what it was told to refresh?

 In any case, having to change this test in any way smells like there
 is some breakage in the series; it is not immediately obvious to me
 that the current test is checking anything wrong as I suspected in
 the earlier message.
 
 So,... I dunno.
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code

2012-08-09 Thread Junio C Hamano
Thomas Gummerer t.gumme...@gmail.com writes:

 On 08/08, Junio C Hamano wrote:
 ...
 Let me ask the same question in a more direct way.  Which part of
 this test break with your series?
 
 test_expect_success 'git add --refresh with pathspec' '
 git reset --hard 
 echo foo  echo bar  echo baz 
 git add foo bar baz  H=$(git rev-parse :foo)  git rm -f 
 foo 
 echo 100644 $H 3foo | git update-index --index-info 
  # sleep 1  in the update here ...
 test-chmtime -60 bar baz 
 expect 
 git add --refresh bar actual 
 test_cmp expect actual 
 
 git diff-files --name-only actual 
 ! grep bar actual
 grep baz actual
 '
 
 We prepare an index with bunch of paths, we make foo unmerged, we
 smudge bar and baz stat-dirty, so that diff-files would report
 them, even though their contents match what is recorded in the
 index.

 After getting confused a bit myself, I now think here is the problem.
 The v5 code smudges baz when doing git add --refresh bar.  Therefore
 baz isn't considered stat-dirty by the code, but a racily smudged entry
 and therefore its content gets checked, thus not showing up in
 git diff-files.

So in short, the breakage is the last one among the three choices I
gave you in my message you are responding to.  The user asked to
refresh bar so that later diff-files won't report a false change
on it, but baz effectively ends up getting refreshed at the same
time and a false change is not reported.

That breakage is, from the correctness point of view, not a
breakage.  As the primary purpose of refreshing is to support
commands that want to rely on a quick ce_modified() call to tell
files that are modified in the working tree since it was last added
to the index---you refresh once, and then you call such commands
many times without having to worry about having to compare the
contents between the indexed objects and the working tree files.

But from the performance point of view, which is the whole point of
refresh, the behaviour of the new code is dubious.  If the user is
working in a large working tree (which automatically means large
index, the primary reason we are doing this v5 experiment), the user
often is working in a deep and narrow subdirectory of it, and a path
limited refresh (the test names a specific file bar, but imagine
it were . to limit it to the directory the user is working in) may
be a cheap way not to bother even checking outside the area the user
currently is working in.  Also, smudging more entries than necessary
to be checked by ce_modified_check_fs() later at runtime may mean
that it defeats the refresh once and then compare cheaply many
times pattern that is employed by existing scripts.

Is the root cause really where the racily-clean so smudge to tell
later runtime to check contents bit goes?  I am hoping that the
issue is not coming from the difference between the current code and
your code when they decide to smudge, what entries they decide to
smudge and based on what condition.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code

2012-08-09 Thread Thomas Gummerer
On 08/09, Junio C Hamano wrote:
 Thomas Gummerer t.gumme...@gmail.com writes:
 
  On 08/08, Junio C Hamano wrote:
  ...
  Let me ask the same question in a more direct way.  Which part of
  this test break with your series?
  
  test_expect_success 'git add --refresh with pathspec' '
  git reset --hard 
  echo foo  echo bar  echo baz 
  git add foo bar baz  H=$(git rev-parse :foo)  git rm 
  -f foo 
  echo 100644 $H 3  foo | git update-index --index-info 
 # sleep 1  in the update here ...
  test-chmtime -60 bar baz 
  expect 
  git add --refresh bar actual 
  test_cmp expect actual 
  
  git diff-files --name-only actual 
  ! grep bar actual
  grep baz actual
  '
  
  We prepare an index with bunch of paths, we make foo unmerged, we
  smudge bar and baz stat-dirty, so that diff-files would report
  them, even though their contents match what is recorded in the
  index.
 
  After getting confused a bit myself, I now think here is the problem.
  The v5 code smudges baz when doing git add --refresh bar.  Therefore
  baz isn't considered stat-dirty by the code, but a racily smudged entry
  and therefore its content gets checked, thus not showing up in
  git diff-files.
 
 So in short, the breakage is the last one among the three choices I
 gave you in my message you are responding to.  The user asked to
 refresh bar so that later diff-files won't report a false change
 on it, but baz effectively ends up getting refreshed at the same
 time and a false change is not reported.

Exactly.

 That breakage is, from the correctness point of view, not a
 breakage.  As the primary purpose of refreshing is to support
 commands that want to rely on a quick ce_modified() call to tell
 files that are modified in the working tree since it was last added
 to the index---you refresh once, and then you call such commands
 many times without having to worry about having to compare the
 contents between the indexed objects and the working tree files.
 
 But from the performance point of view, which is the whole point of
 refresh, the behaviour of the new code is dubious.  If the user is
 working in a large working tree (which automatically means large
 index, the primary reason we are doing this v5 experiment), the user
 often is working in a deep and narrow subdirectory of it, and a path
 limited refresh (the test names a specific file bar, but imagine
 it were . to limit it to the directory the user is working in) may
 be a cheap way not to bother even checking outside the area the user
 currently is working in.

That's true, but once we have the partial reader/writer, we do not
bother checking outside the area the user is currently working in
anyway.

Also and probably more importantly, this will only affect a *very*
small number of entries, because timestamps outside of the directory
in which the user is working in are rarely updated recently and
thus racy.

 Also, smudging more entries than necessary
 to be checked by ce_modified_check_fs() later at runtime may mean
 that it defeats the refresh once and then compare cheaply many
 times pattern that is employed by existing scripts.

The new racy code also calls ce_modified_check_fs() only if the size
and the stat_crc are not changed.  It's true that ce_modified_check_fs()
can be called multiple times, when match_stat_crc() is called, but that
could be solved by adding an additional flag CE_IS_MODIFIED, which
indicates that ce_modified_check_fs() was already run.

 Is the root cause really where the racily-clean so smudge to tell
 later runtime to check contents bit goes?  I am hoping that the
 issue is not coming from the difference between the current code and
 your code when they decide to smudge, what entries they decide to
 smudge and based on what condition.

I just gave it a try using a CE_SMUDGED flag, instead of the mtime
as smudge marker, which which this test works without any problems.
It doesn't work the other way round, the test as the test doesn't
break when using mtime as smudge marker in v2, because we do the
ce_modified_check_fs() test earlier.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code

2012-08-08 Thread Junio C Hamano
Thomas Gummerer t.gumme...@gmail.com writes:

 On 08/05, Junio C Hamano wrote:
 Thomas Gummerer t.gumme...@gmail.com writes:
 
  The new git racy code uses the mtime of cache-entries to smudge
  a racy clean entry, and loads the work, of checking the file-system
 
 -ECANTPARSE.

 The git racy code for index-v5 uses the mtime of the cache-entries as
 smudge markers. The work of checking the file-system is loaded of to
 the reader.

OK, now I can parse, perhaps with either s/is loaded of/f/ or
s/is loaded of/is offloaded/.

Thanks for clarifying the grammar.

But doesn't the current code make it the responsibilty of the reader
to check the contents with ce_modified_check_fs() already?  You may
have switched st_size to st_mtime as the field to mark a racily
clean entry, but it is unclear how that change affects anything.

  if the entry has really changed, off to the reader. This interferes
  with this test, because the entry is racily smudged and thus has
  mtime 0. We wait 1 second to avoid smudging the entry and getting
  correct test results.
 
 Mild NAK, especially it is totally unclear why you even need to muck
 with racy-git check in the current format of the index in the first
 place, and even if it were necessary, it is unclear why this cannot
 be done with test-chmtime.

 The racy-git code needs to be changed, to avoid problems when implementing
 the partial writing for index-v5. Otherwise it could cause problems, when
 we have entries that should be smudged, but are not due to the different
 racy algorithms.

Hrmph.  But if racy detection and checking is now a responsibility
of the later reader, the overall end result should be the same, no?
Perhaps the existing test was checking a wrong thing?

We should not care if the index still has a racily clean entries, or
how that fact is marked in the index entry.  The primary thing we
care about is that we do not mistake an actual change as no change
due to raciness.

So whether done with sleep or test-chmtime, avoiding a racily
clean situation sounds like sweeping a bug in the v5 code in racy
situation under the rug to me (unless I am misunderstanding what
you are doing with this change and in your explanation, or the test
was checking a wrong thing, that is).

Even more confused

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code

2012-08-08 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 So whether done with sleep or test-chmtime, avoiding a racily
 clean situation sounds like sweeping a bug in the v5 code in racy
 situation under the rug to me (unless I am misunderstanding what
 you are doing with this change and in your explanation, or the test
 was checking a wrong thing, that is).

 Even more confused

OK, after staring this test for a long time, and going back to
3d1f148 (refresh_index: do not show unmerged path that is outside
pathspec, 2012-02-17), I give up.

Let me ask the same question in a more direct way.  Which part of
this test break with your series?

test_expect_success 'git add --refresh with pathspec' '
git reset --hard 
echo foo  echo bar  echo baz 
git add foo bar baz  H=$(git rev-parse :foo)  git rm -f foo 

echo 100644 $H 3   foo | git update-index --index-info 
# sleep 1  in the update here ...
test-chmtime -60 bar baz 
expect 
git add --refresh bar actual 
test_cmp expect actual 

git diff-files --name-only actual 
! grep bar actual
grep baz actual
'

We prepare an index with bunch of paths, we make foo unmerged, we
smudge bar and baz stat-dirty, so that diff-files would report
them, even though their contents match what is recorded in the
index.

Then we say git add --refresh bar.  As far as I know, the output
from git add --refresh pathspec is limited to foo: needs merge
if and only if foo is covered by pathspec and foo is unmerged.

Side note: If --verbose is given to the same command, we
also give Unstaged changes after refreshing the index:
followed by M foo or U foo if foo does not match the
index but not unmerged, or if foo is unmerged, again if
and only if foo is covered by pathspec.  But that is not
how we invoke git add --refresh in this test.

So if you are getting a test failure from the test_cmp, wouldn't it
mean that your series broke what 3d1f148 did (namely, make sure we
report only on paths that are covered by pathspec, in this case
bar), as the contents of bar in the working tree matches what is
recorded in the index?

If the failure you are seeing is that bar appears in the output of
git diff-files --name-only, it means that diff-files noticed
that bar is stat-dirty after git add --refresh bar.  Wouldn't it
mean that the series broke git add --refresh bar in such a way
that it does not to refresh what it was told to refresh?

Another test that could fail after the point you added sleep 1 is
that the output from git diff-files --name-only fails to list
baz in its output, but with test-chmtime -60 bar baz, we made
sure that bar and baz are stat-dirty, and we only refreshed
bar and not baz.  If that is the case, then would it mean that
the series broke git add --refresh bar in such a way that it
refreshes something other than what it was told to refresh?

In any case, having to change this test in any way smells like there
is some breakage in the series; it is not immediately obvious to me
that the current test is checking anything wrong as I suspected in
the earlier message.

So,... I dunno.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code

2012-08-07 Thread Thomas Gummerer
On 08/05, Junio C Hamano wrote:
 Thomas Gummerer t.gumme...@gmail.com writes:
 
  The new git racy code uses the mtime of cache-entries to smudge
  a racy clean entry, and loads the work, of checking the file-system
 
 -ECANTPARSE.

The git racy code for index-v5 uses the mtime of the cache-entries as
smudge markers. The work of checking the file-system is loaded of to
the reader.

  if the entry has really changed, off to the reader. This interferes
  with this test, because the entry is racily smudged and thus has
  mtime 0. We wait 1 second to avoid smudging the entry and getting
  correct test results.
 
 Mild NAK, especially it is totally unclear why you even need to muck
 with racy-git check in the current format of the index in the first
 place, and even if it were necessary, it is unclear why this cannot
 be done with test-chmtime.

The racy-git code needs to be changed, to avoid problems when implementing
the partial writing for index-v5. Otherwise it could cause problems, when
we have entries that should be smudged, but are not due to the different
racy algorithms.

I'll do it with test-chmtime in the reroll though.

  Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
  ---
   t/t3700-add.sh |1 +
   1 file changed, 1 insertion(+)
 
  diff --git a/t/t3700-add.sh b/t/t3700-add.sh
  index 874b3a6..4d70805 100755
  --- a/t/t3700-add.sh
  +++ b/t/t3700-add.sh
  @@ -184,6 +184,7 @@ test_expect_success 'git add --refresh with pathspec' '
  echo foo  echo bar  echo baz 
  git add foo bar baz  H=$(git rev-parse :foo)  git rm -f foo 
  echo 100644 $H 3   foo | git update-index --index-info 
  +   sleep 1 
  test-chmtime -60 bar baz 
  expect 
  git add --refresh bar actual 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code

2012-08-05 Thread Thomas Gummerer
The new git racy code uses the mtime of cache-entries to smudge
a racy clean entry, and loads the work, of checking the file-system
if the entry has really changed, off to the reader. This interferes
with this test, because the entry is racily smudged and thus has
mtime 0. We wait 1 second to avoid smudging the entry and getting
correct test results.

Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 t/t3700-add.sh |1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 874b3a6..4d70805 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -184,6 +184,7 @@ test_expect_success 'git add --refresh with pathspec' '
echo foo  echo bar  echo baz 
git add foo bar baz  H=$(git rev-parse :foo)  git rm -f foo 
echo 100644 $H 3   foo | git update-index --index-info 
+   sleep 1 
test-chmtime -60 bar baz 
expect 
git add --refresh bar actual 
-- 
1.7.10.GIT

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code

2012-08-05 Thread Junio C Hamano
Thomas Gummerer t.gumme...@gmail.com writes:

 The new git racy code uses the mtime of cache-entries to smudge
 a racy clean entry, and loads the work, of checking the file-system

-ECANTPARSE.

 if the entry has really changed, off to the reader. This interferes
 with this test, because the entry is racily smudged and thus has
 mtime 0. We wait 1 second to avoid smudging the entry and getting
 correct test results.

Mild NAK, especially it is totally unclear why you even need to muck
with racy-git check in the current format of the index in the first
place, and even if it were necessary, it is unclear why this cannot
be done with test-chmtime.

 Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
 ---
  t/t3700-add.sh |1 +
  1 file changed, 1 insertion(+)

 diff --git a/t/t3700-add.sh b/t/t3700-add.sh
 index 874b3a6..4d70805 100755
 --- a/t/t3700-add.sh
 +++ b/t/t3700-add.sh
 @@ -184,6 +184,7 @@ test_expect_success 'git add --refresh with pathspec' '
   echo foo  echo bar  echo baz 
   git add foo bar baz  H=$(git rev-parse :foo)  git rm -f foo 
   echo 100644 $H 3   foo | git update-index --index-info 
 + sleep 1 
   test-chmtime -60 bar baz 
   expect 
   git add --refresh bar actual 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html