Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code
On 08/08, Junio C Hamano wrote: Junio C Hamano gits...@pobox.com writes: So whether done with sleep or test-chmtime, avoiding a racily clean situation sounds like sweeping a bug in the v5 code in racy situation under the rug to me (unless I am misunderstanding what you are doing with this change and in your explanation, or the test was checking a wrong thing, that is). Even more confused OK, after staring this test for a long time, and going back to 3d1f148 (refresh_index: do not show unmerged path that is outside pathspec, 2012-02-17), I give up. Let me ask the same question in a more direct way. Which part of this test break with your series? test_expect_success 'git add --refresh with pathspec' ' git reset --hard echo foo echo bar echo baz git add foo bar baz H=$(git rev-parse :foo) git rm -f foo echo 100644 $H 3 foo | git update-index --index-info # sleep 1 in the update here ... test-chmtime -60 bar baz expect git add --refresh bar actual test_cmp expect actual git diff-files --name-only actual ! grep bar actual grep baz actual ' We prepare an index with bunch of paths, we make foo unmerged, we smudge bar and baz stat-dirty, so that diff-files would report them, even though their contents match what is recorded in the index. After getting confused a bit myself, I now think here is the problem. The v5 code smudges baz when doing git add --refresh bar. Therefore baz isn't considered stat-dirty by the code, but a racily smudged entry and therefore its content gets checked, thus not showing up in git diff-files. The mtime doesn't get checked anymore as it is used as smudge marker and thus 0. Adding sleep just avoids smudging the entry. The alternative would be to use the size or the crc as smudge marker but I don't think they are good canidates, as they can still be used by the reader to avoid checking the filesystem. Another alternative would be to introduce a CE_SMUDGED flag as it was suggested by Thomas on irc IIRC, but we chose to use the mtime as smudge marker instead. Then we say git add --refresh bar. As far as I know, the output from git add --refresh pathspec is limited to foo: needs merge if and only if foo is covered by pathspec and foo is unmerged. Side note: If --verbose is given to the same command, we also give Unstaged changes after refreshing the index: followed by M foo or U foo if foo does not match the index but not unmerged, or if foo is unmerged, again if and only if foo is covered by pathspec. But that is not how we invoke git add --refresh in this test. So if you are getting a test failure from the test_cmp, wouldn't it mean that your series broke what 3d1f148 did (namely, make sure we report only on paths that are covered by pathspec, in this case bar), as the contents of bar in the working tree matches what is recorded in the index? If the failure you are seeing is that bar appears in the output of git diff-files --name-only, it means that diff-files noticed that bar is stat-dirty after git add --refresh bar. Wouldn't it mean that the series broke git add --refresh bar in such a way that it does not to refresh what it was told to refresh? Another test that could fail after the point you added sleep 1 is that the output from git diff-files --name-only fails to list baz in its output, but with test-chmtime -60 bar baz, we made sure that bar and baz are stat-dirty, and we only refreshed bar and not baz. If that is the case, then would it mean that the series broke git add --refresh bar in such a way that it refreshes something other than what it was told to refresh? In any case, having to change this test in any way smells like there is some breakage in the series; it is not immediately obvious to me that the current test is checking anything wrong as I suspected in the earlier message. So,... I dunno. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code
Thomas Gummerer t.gumme...@gmail.com writes: On 08/08, Junio C Hamano wrote: ... Let me ask the same question in a more direct way. Which part of this test break with your series? test_expect_success 'git add --refresh with pathspec' ' git reset --hard echo foo echo bar echo baz git add foo bar baz H=$(git rev-parse :foo) git rm -f foo echo 100644 $H 3foo | git update-index --index-info # sleep 1 in the update here ... test-chmtime -60 bar baz expect git add --refresh bar actual test_cmp expect actual git diff-files --name-only actual ! grep bar actual grep baz actual ' We prepare an index with bunch of paths, we make foo unmerged, we smudge bar and baz stat-dirty, so that diff-files would report them, even though their contents match what is recorded in the index. After getting confused a bit myself, I now think here is the problem. The v5 code smudges baz when doing git add --refresh bar. Therefore baz isn't considered stat-dirty by the code, but a racily smudged entry and therefore its content gets checked, thus not showing up in git diff-files. So in short, the breakage is the last one among the three choices I gave you in my message you are responding to. The user asked to refresh bar so that later diff-files won't report a false change on it, but baz effectively ends up getting refreshed at the same time and a false change is not reported. That breakage is, from the correctness point of view, not a breakage. As the primary purpose of refreshing is to support commands that want to rely on a quick ce_modified() call to tell files that are modified in the working tree since it was last added to the index---you refresh once, and then you call such commands many times without having to worry about having to compare the contents between the indexed objects and the working tree files. But from the performance point of view, which is the whole point of refresh, the behaviour of the new code is dubious. If the user is working in a large working tree (which automatically means large index, the primary reason we are doing this v5 experiment), the user often is working in a deep and narrow subdirectory of it, and a path limited refresh (the test names a specific file bar, but imagine it were . to limit it to the directory the user is working in) may be a cheap way not to bother even checking outside the area the user currently is working in. Also, smudging more entries than necessary to be checked by ce_modified_check_fs() later at runtime may mean that it defeats the refresh once and then compare cheaply many times pattern that is employed by existing scripts. Is the root cause really where the racily-clean so smudge to tell later runtime to check contents bit goes? I am hoping that the issue is not coming from the difference between the current code and your code when they decide to smudge, what entries they decide to smudge and based on what condition. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code
On 08/09, Junio C Hamano wrote: Thomas Gummerer t.gumme...@gmail.com writes: On 08/08, Junio C Hamano wrote: ... Let me ask the same question in a more direct way. Which part of this test break with your series? test_expect_success 'git add --refresh with pathspec' ' git reset --hard echo foo echo bar echo baz git add foo bar baz H=$(git rev-parse :foo) git rm -f foo echo 100644 $H 3 foo | git update-index --index-info # sleep 1 in the update here ... test-chmtime -60 bar baz expect git add --refresh bar actual test_cmp expect actual git diff-files --name-only actual ! grep bar actual grep baz actual ' We prepare an index with bunch of paths, we make foo unmerged, we smudge bar and baz stat-dirty, so that diff-files would report them, even though their contents match what is recorded in the index. After getting confused a bit myself, I now think here is the problem. The v5 code smudges baz when doing git add --refresh bar. Therefore baz isn't considered stat-dirty by the code, but a racily smudged entry and therefore its content gets checked, thus not showing up in git diff-files. So in short, the breakage is the last one among the three choices I gave you in my message you are responding to. The user asked to refresh bar so that later diff-files won't report a false change on it, but baz effectively ends up getting refreshed at the same time and a false change is not reported. Exactly. That breakage is, from the correctness point of view, not a breakage. As the primary purpose of refreshing is to support commands that want to rely on a quick ce_modified() call to tell files that are modified in the working tree since it was last added to the index---you refresh once, and then you call such commands many times without having to worry about having to compare the contents between the indexed objects and the working tree files. But from the performance point of view, which is the whole point of refresh, the behaviour of the new code is dubious. If the user is working in a large working tree (which automatically means large index, the primary reason we are doing this v5 experiment), the user often is working in a deep and narrow subdirectory of it, and a path limited refresh (the test names a specific file bar, but imagine it were . to limit it to the directory the user is working in) may be a cheap way not to bother even checking outside the area the user currently is working in. That's true, but once we have the partial reader/writer, we do not bother checking outside the area the user is currently working in anyway. Also and probably more importantly, this will only affect a *very* small number of entries, because timestamps outside of the directory in which the user is working in are rarely updated recently and thus racy. Also, smudging more entries than necessary to be checked by ce_modified_check_fs() later at runtime may mean that it defeats the refresh once and then compare cheaply many times pattern that is employed by existing scripts. The new racy code also calls ce_modified_check_fs() only if the size and the stat_crc are not changed. It's true that ce_modified_check_fs() can be called multiple times, when match_stat_crc() is called, but that could be solved by adding an additional flag CE_IS_MODIFIED, which indicates that ce_modified_check_fs() was already run. Is the root cause really where the racily-clean so smudge to tell later runtime to check contents bit goes? I am hoping that the issue is not coming from the difference between the current code and your code when they decide to smudge, what entries they decide to smudge and based on what condition. I just gave it a try using a CE_SMUDGED flag, instead of the mtime as smudge marker, which which this test works without any problems. It doesn't work the other way round, the test as the test doesn't break when using mtime as smudge marker in v2, because we do the ce_modified_check_fs() test earlier. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code
Thomas Gummerer t.gumme...@gmail.com writes: On 08/05, Junio C Hamano wrote: Thomas Gummerer t.gumme...@gmail.com writes: The new git racy code uses the mtime of cache-entries to smudge a racy clean entry, and loads the work, of checking the file-system -ECANTPARSE. The git racy code for index-v5 uses the mtime of the cache-entries as smudge markers. The work of checking the file-system is loaded of to the reader. OK, now I can parse, perhaps with either s/is loaded of/f/ or s/is loaded of/is offloaded/. Thanks for clarifying the grammar. But doesn't the current code make it the responsibilty of the reader to check the contents with ce_modified_check_fs() already? You may have switched st_size to st_mtime as the field to mark a racily clean entry, but it is unclear how that change affects anything. if the entry has really changed, off to the reader. This interferes with this test, because the entry is racily smudged and thus has mtime 0. We wait 1 second to avoid smudging the entry and getting correct test results. Mild NAK, especially it is totally unclear why you even need to muck with racy-git check in the current format of the index in the first place, and even if it were necessary, it is unclear why this cannot be done with test-chmtime. The racy-git code needs to be changed, to avoid problems when implementing the partial writing for index-v5. Otherwise it could cause problems, when we have entries that should be smudged, but are not due to the different racy algorithms. Hrmph. But if racy detection and checking is now a responsibility of the later reader, the overall end result should be the same, no? Perhaps the existing test was checking a wrong thing? We should not care if the index still has a racily clean entries, or how that fact is marked in the index entry. The primary thing we care about is that we do not mistake an actual change as no change due to raciness. So whether done with sleep or test-chmtime, avoiding a racily clean situation sounds like sweeping a bug in the v5 code in racy situation under the rug to me (unless I am misunderstanding what you are doing with this change and in your explanation, or the test was checking a wrong thing, that is). Even more confused -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code
Junio C Hamano gits...@pobox.com writes: So whether done with sleep or test-chmtime, avoiding a racily clean situation sounds like sweeping a bug in the v5 code in racy situation under the rug to me (unless I am misunderstanding what you are doing with this change and in your explanation, or the test was checking a wrong thing, that is). Even more confused OK, after staring this test for a long time, and going back to 3d1f148 (refresh_index: do not show unmerged path that is outside pathspec, 2012-02-17), I give up. Let me ask the same question in a more direct way. Which part of this test break with your series? test_expect_success 'git add --refresh with pathspec' ' git reset --hard echo foo echo bar echo baz git add foo bar baz H=$(git rev-parse :foo) git rm -f foo echo 100644 $H 3 foo | git update-index --index-info # sleep 1 in the update here ... test-chmtime -60 bar baz expect git add --refresh bar actual test_cmp expect actual git diff-files --name-only actual ! grep bar actual grep baz actual ' We prepare an index with bunch of paths, we make foo unmerged, we smudge bar and baz stat-dirty, so that diff-files would report them, even though their contents match what is recorded in the index. Then we say git add --refresh bar. As far as I know, the output from git add --refresh pathspec is limited to foo: needs merge if and only if foo is covered by pathspec and foo is unmerged. Side note: If --verbose is given to the same command, we also give Unstaged changes after refreshing the index: followed by M foo or U foo if foo does not match the index but not unmerged, or if foo is unmerged, again if and only if foo is covered by pathspec. But that is not how we invoke git add --refresh in this test. So if you are getting a test failure from the test_cmp, wouldn't it mean that your series broke what 3d1f148 did (namely, make sure we report only on paths that are covered by pathspec, in this case bar), as the contents of bar in the working tree matches what is recorded in the index? If the failure you are seeing is that bar appears in the output of git diff-files --name-only, it means that diff-files noticed that bar is stat-dirty after git add --refresh bar. Wouldn't it mean that the series broke git add --refresh bar in such a way that it does not to refresh what it was told to refresh? Another test that could fail after the point you added sleep 1 is that the output from git diff-files --name-only fails to list baz in its output, but with test-chmtime -60 bar baz, we made sure that bar and baz are stat-dirty, and we only refreshed bar and not baz. If that is the case, then would it mean that the series broke git add --refresh bar in such a way that it refreshes something other than what it was told to refresh? In any case, having to change this test in any way smells like there is some breakage in the series; it is not immediately obvious to me that the current test is checking anything wrong as I suspected in the earlier message. So,... I dunno. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code
On 08/05, Junio C Hamano wrote: Thomas Gummerer t.gumme...@gmail.com writes: The new git racy code uses the mtime of cache-entries to smudge a racy clean entry, and loads the work, of checking the file-system -ECANTPARSE. The git racy code for index-v5 uses the mtime of the cache-entries as smudge markers. The work of checking the file-system is loaded of to the reader. if the entry has really changed, off to the reader. This interferes with this test, because the entry is racily smudged and thus has mtime 0. We wait 1 second to avoid smudging the entry and getting correct test results. Mild NAK, especially it is totally unclear why you even need to muck with racy-git check in the current format of the index in the first place, and even if it were necessary, it is unclear why this cannot be done with test-chmtime. The racy-git code needs to be changed, to avoid problems when implementing the partial writing for index-v5. Otherwise it could cause problems, when we have entries that should be smudged, but are not due to the different racy algorithms. I'll do it with test-chmtime in the reroll though. Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- t/t3700-add.sh |1 + 1 file changed, 1 insertion(+) diff --git a/t/t3700-add.sh b/t/t3700-add.sh index 874b3a6..4d70805 100755 --- a/t/t3700-add.sh +++ b/t/t3700-add.sh @@ -184,6 +184,7 @@ test_expect_success 'git add --refresh with pathspec' ' echo foo echo bar echo baz git add foo bar baz H=$(git rev-parse :foo) git rm -f foo echo 100644 $H 3 foo | git update-index --index-info + sleep 1 test-chmtime -60 bar baz expect git add --refresh bar actual -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code
The new git racy code uses the mtime of cache-entries to smudge a racy clean entry, and loads the work, of checking the file-system if the entry has really changed, off to the reader. This interferes with this test, because the entry is racily smudged and thus has mtime 0. We wait 1 second to avoid smudging the entry and getting correct test results. Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- t/t3700-add.sh |1 + 1 file changed, 1 insertion(+) diff --git a/t/t3700-add.sh b/t/t3700-add.sh index 874b3a6..4d70805 100755 --- a/t/t3700-add.sh +++ b/t/t3700-add.sh @@ -184,6 +184,7 @@ test_expect_success 'git add --refresh with pathspec' ' echo foo echo bar echo baz git add foo bar baz H=$(git rev-parse :foo) git rm -f foo echo 100644 $H 3 foo | git update-index --index-info + sleep 1 test-chmtime -60 bar baz expect git add --refresh bar actual -- 1.7.10.GIT -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code
Thomas Gummerer t.gumme...@gmail.com writes: The new git racy code uses the mtime of cache-entries to smudge a racy clean entry, and loads the work, of checking the file-system -ECANTPARSE. if the entry has really changed, off to the reader. This interferes with this test, because the entry is racily smudged and thus has mtime 0. We wait 1 second to avoid smudging the entry and getting correct test results. Mild NAK, especially it is totally unclear why you even need to muck with racy-git check in the current format of the index in the first place, and even if it were necessary, it is unclear why this cannot be done with test-chmtime. Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- t/t3700-add.sh |1 + 1 file changed, 1 insertion(+) diff --git a/t/t3700-add.sh b/t/t3700-add.sh index 874b3a6..4d70805 100755 --- a/t/t3700-add.sh +++ b/t/t3700-add.sh @@ -184,6 +184,7 @@ test_expect_success 'git add --refresh with pathspec' ' echo foo echo bar echo baz git add foo bar baz H=$(git rev-parse :foo) git rm -f foo echo 100644 $H 3 foo | git update-index --index-info + sleep 1 test-chmtime -60 bar baz expect git add --refresh bar actual -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html