Re: [PATCHv2] write_index: optionally allow broken null sha1s
Jeff King writes: > I'd be very surprised if this works in practice on most of our current > test scripts. There are often subtle dependencies on the state left over > from previous tests. Running the script below up through t3800 (at which > point I lost patience) reveals 37 test scripts that are broken. Which is > only about 17%, but we're clearly not quite there yet. > ... Yes, I agree that in an ideal world with infinite amount of time, we should strive to make sure that each test is independent from previous steps, and if we do not have infinite amount of time, we should make sure we find some (say 5%) time to various code clean-up effort, in both writing and reviewing such, including tests. The attached is cute. Thanks for a food for thought. > -- >8 -- > #!/usr/bin/perl > # > # Run as "perl foo.pl t-basic.sh" (or whatever script you want to > # check). > > my $script = shift; > my ($script_num) = $script =~ /^(t\d+)/; > > # run once to get the list of tests > my @tests = run_tests($script); > > # mark some tests as "setup" tests that will always be run > foreach my $test (@tests) { > if ($test->{desc} =~ /set.?up/i) { > print STDERR "marking $test->{num} - $test->{desc} as setup\n"; > $test->{setup} = 1; > } > } > > # now try each test in isolation, but including setup tests > foreach my $test (@tests) { > $ENV{GIT_SKIP_TESTS} = skip_all_except($script_num, $test, @tests); > run_tests($script) or die "failed $test->{num} ($test->{desc})\n"; > } > > sub run_tests { > my @r; > open(my $fh, '-|', qw(sh), $script); > while (<$fh>) { > /^ok (\d+) - (.*)/ and push @r, { num => $1, desc => $2 }; > } > $? and return (); > return @r; > } > > sub skip_all_except { > my $prefix = shift; > my $want = shift; > > return join(' ', > map { "$prefix.$_->{num}" } > grep { !$_->{setup} && $_->{num} != $want->{num} } > @_); > } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] write_index: optionally allow broken null sha1s
On Mon, Aug 26, 2013 at 09:02:46AM -0700, Junio C Hamano wrote: > > There's one subtle thing I didn't mention in the "it is already on stack > > overflow". If you have a version of git which complains about the null > > sha1, then the SO advice is already broken. But if the SO works, then > > you do not have a version of git which complains. So why do you care? > > > > And the answer is: you may be pushing to a remote with a version of git > > that complains, and which has receive.fsckObjects set (and in many > > cases, that remote is GitHub, since we have had that check on for a > > while). > > > > I don't know if it is worth spelling that out or not. > > Probably not. > > You could aim to correct each and every wrong suggestions on a site > where misguided leads other misguided, but it is a hopeless task. I do no think the advice on SO was misguided. My point was that a reader of the revised commit message mentioning SO may wonder "why would people on SO suggest that? It does not work since we introduced the check in the first place". But if neither you nor Jonathan wondered about it, then it is probably not worth caring. > >> > After this patch, do you think (in a separate change) it would make > >> > sense for cache-tree.c::update_one() to check for null sha1 and error > >> > out unless GIT_ALLOW_NULL_SHA1 is true? That would let us get rid of > >> > the caveat from the last paragraph. > [...] Hmm. We are already complaining about the null sha1 in most cases, because we do not have the object. It is only for submodules that the bogus entry can make it into a tree. So I tried this: diff --git a/cache-tree.c b/cache-tree.c index 0bbec43..0ec3923 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -343,7 +343,8 @@ static int update_one(struct cache_tree *it, entlen = pathlen - baselen; i++; } - if (mode != S_IFGITLINK && !missing_ok && !has_sha1_file(sha1)) { + if ((mode != S_IFGITLINK && !missing_ok && !has_sha1_file(sha1)) + || (mode == S_IFGITLINK && is_null_sha1(sha1))) { strbuf_release(&buffer); return error("invalid object %06o %s for '%.*s'", mode, sha1_to_hex(sha1), entlen+baselen, path); diff --git a/t/t7009-filter-branch-null-sha1.sh b/t/t7009-filter-branch-null-sha1.sh index 50d5689..59c5edc 100755 --- a/t/t7009-filter-branch-null-sha1.sh +++ b/t/t7009-filter-branch-null-sha1.sh @@ -31,10 +31,10 @@ test_expect_success 'filter commands are still checked' ' git commit -a -m "back to normal" ' -test_expect_success 'filter commands are still checked' ' +test_expect_success 'complain about trees that are broken after filtering' ' test_must_fail git filter-branch \ --force --prune-empty \ - --index-filter "git rm --cached --ignore-unmatch three.t" + --index-filter "" ' test_expect_success 'removing the broken entry works' ' But it doesn't work. The reason is that we do not need to re-make the cache-tree, because we are just handing back the crap that we got from the original tree. And I do not think we would want to invalidate the whole cache tree just on the off chance that it has a problem; filter-branch is slow enough as it is. So I think we are better off to let filter-branch run at full speed, even if optimizations like cache-tree mean we are missing out on validating all of the data. If you want to know whether your repo is in good shape, you should run `git fsck`. -Peff PS I got more information from the user who originally experienced this problem (and yes, it was a submodule entry). It turns out that he had done some unusual stuff with having a git repository inside his Eclipse project, and adding it as a sub-project of his build. So it was almost certainly EGit/JGit which wrote the bad entry in the first place (and he is making a bug report to EGit folks). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] write_index: optionally allow broken null sha1s
On Mon, Aug 26, 2013 at 10:35:01AM -0700, Jonathan Nieder wrote: > > I don't see that splitting it up more hurts this. If we wanted more > > automatic rearranging or skipping of tests, we would need tests to > > declare dependencies on their setup. And we would need to be able to > > declare dependencies on multiple tests, since having a single setup test > > does not work in all cases (e.g., sometimes you are testing each step, > > and the final step relies on earlier steps). > > Actually dependencies can already be inferred for most test scripts > using the following rule: > > Each test depends on all tests with the word "setup" or the > phrase "set up" in their title that precede it. I'd be very surprised if this works in practice on most of our current test scripts. There are often subtle dependencies on the state left over from previous tests. Running the script below up through t3800 (at which point I lost patience) reveals 37 test scripts that are broken. Which is only about 17%, but we're clearly not quite there yet. But sure, I agree that is something to strive for. But I think my point stands; splitting up the setup doesn't hurt as long as you note all of the tests as dependencies. And by your rule above, the advice would be "each of these tests should say "setup" in the description". That makes sense to me, and I don't mind doing it here. > Of course splitting up the setup into 3 steps neither helps nor hurts > that. What I was complaining about is splitting the filter-branch > from the verification that filter-branch had the right result. I totally missed your original point, then. I don't mind combining the last two into a single test. That makes sense to me (I only split them at all because I added the second one much later during the development). -Peff -- >8 -- #!/usr/bin/perl # # Run as "perl foo.pl t-basic.sh" (or whatever script you want to # check). my $script = shift; my ($script_num) = $script =~ /^(t\d+)/; # run once to get the list of tests my @tests = run_tests($script); # mark some tests as "setup" tests that will always be run foreach my $test (@tests) { if ($test->{desc} =~ /set.?up/i) { print STDERR "marking $test->{num} - $test->{desc} as setup\n"; $test->{setup} = 1; } } # now try each test in isolation, but including setup tests foreach my $test (@tests) { $ENV{GIT_SKIP_TESTS} = skip_all_except($script_num, $test, @tests); run_tests($script) or die "failed $test->{num} ($test->{desc})\n"; } sub run_tests { my @r; open(my $fh, '-|', qw(sh), $script); while (<$fh>) { /^ok (\d+) - (.*)/ and push @r, { num => $1, desc => $2 }; } $? and return (); return @r; } sub skip_all_except { my $prefix = shift; my $want = shift; return join(' ', map { "$prefix.$_->{num}" } grep { !$_->{setup} && $_->{num} != $want->{num} } @_); } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] write_index: optionally allow broken null sha1s
Jeff King wrote: > On Sun, Aug 25, 2013 at 12:54:12PM -0700, Jonathan Nieder wrote: >> [setup split across three tests] >> >> This is kind of an old-fashioned test, since each step of the setup is >> treated as a separate test assertion. I don't really mind until we >> get better automation to make it easy to skip or rearrange tests. >> Just for reference, I think the usual way to do this now is > > I don't see that splitting it up more hurts this. If we wanted more > automatic rearranging or skipping of tests, we would need tests to > declare dependencies on their setup. And we would need to be able to > declare dependencies on multiple tests, since having a single setup test > does not work in all cases (e.g., sometimes you are testing each step, > and the final step relies on earlier steps). Actually dependencies can already be inferred for most test scripts using the following rule: Each test depends on all tests with the word "setup" or the phrase "set up" in their title that precede it. And while there's no existing automation for testing that that's all the tests rely on (by automatically skipping or reordering tests, or running non-setup tests in separate sandboxes in parallel), in practice it is still already useful since it makes it safe to use GIT_SKIP_TESTS subject to the following constraint: In each test script, for every setup test skipped, all later tests are skipped as well. I don't care as much about GIT_SKIP_TESTS as about being able to introduce new tests in the middle of a file. Of course splitting up the setup into 3 steps neither helps nor hurts that. What I was complaining about is splitting the filter-branch from the verification that filter-branch had the right result. Sorry for the confusion, Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] write_index: optionally allow broken null sha1s
Jeff King writes: >> I found this version more readable than Peff's (albeit slightly). > > OK. Do you want to apply with Jonathan's wording, then? I can do that, as it seems all of us are in agreement. > There's one subtle thing I didn't mention in the "it is already on stack > overflow". If you have a version of git which complains about the null > sha1, then the SO advice is already broken. But if the SO works, then > you do not have a version of git which complains. So why do you care? > > And the answer is: you may be pushing to a remote with a version of git > that complains, and which has receive.fsckObjects set (and in many > cases, that remote is GitHub, since we have had that check on for a > while). > > I don't know if it is worth spelling that out or not. Probably not. You could aim to correct each and every wrong suggestions on a site where misguided leads other misguided, but it is a hopeless task. >> > After this patch, do you think (in a separate change) it would make >> > sense for cache-tree.c::update_one() to check for null sha1 and error >> > out unless GIT_ALLOW_NULL_SHA1 is true? That would let us get rid of >> > the caveat from the last paragraph. >> >> Hmm, interesting thought. > > I think it is worth doing. The main reason I put the original check on > writing to the index is that it more clearly pinpoints the source of the > error. If we just died during git-write-tree, then you know somebody > broke your index, but you don't know which command. > > But checking in both places would add extra protection, and would make > possible the "relax on read, strict on write" policy that filter-branch > wants to do. Yeah, I agree with all of the above. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] write_index: optionally allow broken null sha1s
On Sun, Aug 25, 2013 at 11:03:54PM -0700, Junio C Hamano wrote: > Jonathan Nieder writes: > > > In other words, why not use something like this? > > > > write_index: optionally allow broken null sha1s > > > > Commit 4337b58 (do not write null sha1s to on-disk index, 2012-07-28) > > added a safety check preventing git from writing null sha1s into the > > index. The intent was to catch errors in other parts of the code that > > might let such an entry slip into the index (or worse, a tree). > > > > Some existing repositories have some invalid trees that contain null > > sha1s already, though. Until 4337b58, a common way to clean this up > > would be to use git-filter-branch's index-filter to repair such broken > > entries. That now fails when filter-branch tries to write out the > > index. > > > > Introduce a GIT_ALLOW_NULL_SHA1 environment variable to relax this check > > and make it easier to recover from such a history. > > I found this version more readable than Peff's (albeit slightly). OK. Do you want to apply with Jonathan's wording, then? There's one subtle thing I didn't mention in the "it is already on stack overflow". If you have a version of git which complains about the null sha1, then the SO advice is already broken. But if the SO works, then you do not have a version of git which complains. So why do you care? And the answer is: you may be pushing to a remote with a version of git that complains, and which has receive.fsckObjects set (and in many cases, that remote is GitHub, since we have had that check on for a while). I don't know if it is worth spelling that out or not. > > After this patch, do you think (in a separate change) it would make > > sense for cache-tree.c::update_one() to check for null sha1 and error > > out unless GIT_ALLOW_NULL_SHA1 is true? That would let us get rid of > > the caveat from the last paragraph. > > Hmm, interesting thought. I think it is worth doing. The main reason I put the original check on writing to the index is that it more clearly pinpoints the source of the error. If we just died during git-write-tree, then you know somebody broke your index, but you don't know which command. But checking in both places would add extra protection, and would make possible the "relax on read, strict on write" policy that filter-branch wants to do. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] write_index: optionally allow broken null sha1s
On Sun, Aug 25, 2013 at 12:54:12PM -0700, Jonathan Nieder wrote: > [setup split across three tests] > > This is kind of an old-fashioned test, since each step of the setup is > treated as a separate test assertion. I don't really mind until we > get better automation to make it easy to skip or rearrange tests. > Just for reference, I think the usual way to do this now is I don't see that splitting it up more hurts this. If we wanted more automatic rearranging or skipping of tests, we would need tests to declare dependencies on their setup. And we would need to be able to declare dependencies on multiple tests, since having a single setup test does not work in all cases (e.g., sometimes you are testing each step, and the final step relies on earlier steps). And I do think splitting it up has more granularity. For example, in the sequence here: > test_expect_success 'setup' ' > # create base commits > ... > > # create a commit with bogus null sha1 in the tree > ... > > # We have to make one more commit on top removing the broken > # entry, since otherwise our index does not match HEAD and > # filter-branch will complain. We could make the index match > # HEAD, but doing so would involve writing a null sha1 into > # the index. > ... > ' Right now it is not hard to do step 2. But I could easily imagine a world in which git-mktree learns to complain about such bogus trees. And the sequence would become: 1. create base commits 2. check that mktree barfs 3. check that we can override mktree to create a broken tree 4. clean up tip state in which case you really want to have individual tests. I do not care _that_ much, since mktree does not behave that way now, and the setup is otherwise not that likely to fail. But I do not think more granularity actually hurts us, and it can sometimes help (as described above, but also when something fails, the test output more clearly pinpoints what happened). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] write_index: optionally allow broken null sha1s
Jonathan Nieder writes: > In other words, why not use something like this? > > write_index: optionally allow broken null sha1s > > Commit 4337b58 (do not write null sha1s to on-disk index, 2012-07-28) > added a safety check preventing git from writing null sha1s into the > index. The intent was to catch errors in other parts of the code that > might let such an entry slip into the index (or worse, a tree). > > Some existing repositories have some invalid trees that contain null > sha1s already, though. Until 4337b58, a common way to clean this up > would be to use git-filter-branch's index-filter to repair such broken > entries. That now fails when filter-branch tries to write out the > index. > > Introduce a GIT_ALLOW_NULL_SHA1 environment variable to relax this check > and make it easier to recover from such a history. I found this version more readable than Peff's (albeit slightly). > After this patch, do you think (in a separate change) it would make > sense for cache-tree.c::update_one() to check for null sha1 and error > out unless GIT_ALLOW_NULL_SHA1 is true? That would let us get rid of > the caveat from the last paragraph. Hmm, interesting thought. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] write_index: optionally allow broken null sha1s
On Sun, Aug 25, 2013 at 05:58:18AM -0400, Jeff King wrote: > On Sat, Aug 24, 2013 at 11:15:00PM -0700, Jonathan Nieder wrote: >>> I was tempted to not involve filter-branch in this commit at all, and >>> instead require the user to manually invoke >>> >>> GIT_ALLOW_NULL_SHA1=1 git filter-branch ... >>> >>> to perform such a filter. That would be slightly safer, but requires >>> some specialized knowledge from the user (and advice on using >>> filter-branch to remove such entries already exists on places like >>> stackoverflow, and this patch makes it Just Work on recent versions of >>> git). >> >> The above few paragraphs explained the most mysterious part of the >> patch to me. I think they would be fine to include in the commit >> message. > > I rolled them into the commit message. Hm --- the most useful part was "advice on using filter-branch to remove such entries already exists on places like stackoverflow", which was dropped. From my point of view, that is exactly the motivation of the patch. I also found the "I was tempted to ... That would be slightly safer, but requires ..." structure easier to read. In other words, why not use something like this? write_index: optionally allow broken null sha1s Commit 4337b58 (do not write null sha1s to on-disk index, 2012-07-28) added a safety check preventing git from writing null sha1s into the index. The intent was to catch errors in other parts of the code that might let such an entry slip into the index (or worse, a tree). Some existing repositories have some invalid trees that contain null sha1s already, though. Until 4337b58, a common way to clean this up would be to use git-filter-branch's index-filter to repair such broken entries. That now fails when filter-branch tries to write out the index. Introduce a GIT_ALLOW_NULL_SHA1 environment variable to relax this check and make it easier to recover from such a history. It is tempting to not involve filter-branch in this commit at all, and instead require the user to manually invoke GIT_ALLOW_NULL_SHA1=1 git filter-branch ... to perform an index-filter on a history with trees with null sha1s. That would be slightly safer, but requires some specialized knowledge from the user. So let's set the GIT_ALLOW_NULL_SHA1 variable automatically when checking out the to-be-filtered trees. Advice on using filter-branch to remove such entries already exists on places like stackoverflow, and this patch makes it Just Work again on recent versions of git. Further commands that touch the index will still notice and fail, unless they actually remove the broken entries. A filter-branch whose filters do not touch the index at all will not error out (since we complain of the null sha1 only on writing, not when making a tree out of the index), but this is acceptable, as we still print a loud warning, so the problem is unlikely to go unnoticed. With or without such a change, Reviewed-by: Jonathan Nieder After this patch, do you think (in a separate change) it would make sense for cache-tree.c::update_one() to check for null sha1 and error out unless GIT_ALLOW_NULL_SHA1 is true? That would let us get rid of the caveat from the last paragraph. [...] > --- /dev/null > +++ b/t/t7009-filter-branch-null-sha1.sh > @@ -0,0 +1,52 @@ > +#!/bin/sh > + > +test_description='filter-branch removal of trees with null sha1' > +. ./test-lib.sh > + > +test_expect_success 'create base commits' ' > + test_commit one && > + test_commit two && > + test_commit three > +' > + > +test_expect_success 'create a commit with a bogus null sha1 in the tree' ' > + { > + git ls-tree HEAD && > + printf "16 commit $_z40\\tbroken\\n" > + } >broken-tree > + echo "add broken entry" >msg && > + > + tree=$(git mktree + test_tick && > + commit=$(git commit-tree $tree -p HEAD + git update-ref HEAD "$commit" > +' > + > +# we have to make one more commit on top removing the broken > +# entry, since otherwise our index does not match HEAD (and filter-branch > will > +# complain). We could make the index match HEAD, but doing so would involve > +# writing a null sha1 into the index. > +test_expect_success 'create a commit dropping the broken entry' ' > + test_tick && > + git commit -a -m "back to normal" > +' This is kind of an old-fashioned test, since each step of the setup is treated as a separate test assertion. I don't really mind until we get better automation to make it easy to skip or rearrange tests. Just for reference, I think the usual way to do this now is test_expect_success 'setup' ' # create base commits ... # create a commit with bogus null sha1 in the tree