Re: [PATCHv2] write_index: optionally allow broken null sha1s

2013-08-26 Thread Junio C Hamano
Jonathan Nieder jrnie...@gmail.com writes:

 In other words, why not use something like this?

   write_index: optionally allow broken null sha1s

   Commit 4337b58 (do not write null sha1s to on-disk index, 2012-07-28)
   added a safety check preventing git from writing null sha1s into the
   index. The intent was to catch errors in other parts of the code that
   might let such an entry slip into the index (or worse, a tree).

   Some existing repositories have some invalid trees that contain null
   sha1s already, though.  Until 4337b58, a common way to clean this up
   would be to use git-filter-branch's index-filter to repair such broken
   entries.  That now fails when filter-branch tries to write out the
   index.

   Introduce a GIT_ALLOW_NULL_SHA1 environment variable to relax this check
   and make it easier to recover from such a history.

I found this version more readable than Peff's (albeit slightly).

 After this patch, do you think (in a separate change) it would make
 sense for cache-tree.c::update_one() to check for null sha1 and error
 out unless GIT_ALLOW_NULL_SHA1 is true?  That would let us get rid of
 the caveat from the last paragraph.

Hmm, interesting thought.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] write_index: optionally allow broken null sha1s

2013-08-26 Thread Jeff King
On Sun, Aug 25, 2013 at 12:54:12PM -0700, Jonathan Nieder wrote:

 [setup split across three tests]

 This is kind of an old-fashioned test, since each step of the setup is
 treated as a separate test assertion.  I don't really mind until we
 get better automation to make it easy to skip or rearrange tests.
 Just for reference, I think the usual way to do this now is

I don't see that splitting it up more hurts this. If we wanted more
automatic rearranging or skipping of tests, we would need tests to
declare dependencies on their setup. And we would need to be able to
declare dependencies on multiple tests, since having a single setup test
does not work in all cases (e.g., sometimes you are testing each step,
and the final step relies on earlier steps).

And I do think splitting it up has more granularity. For example, in the
sequence here:

   test_expect_success 'setup' '
   # create base commits
   ...
 
   # create a commit with bogus null sha1 in the tree
   ...
 
   # We have to make one more commit on top removing the broken
   # entry, since otherwise our index does not match HEAD and
   # filter-branch will complain. We could make the index match
   # HEAD, but doing so would involve writing a null sha1 into
   # the index.
   ...
   '

Right now it is not hard to do step 2. But I could easily imagine a
world in which git-mktree learns to complain about such bogus trees. And
the sequence would become:

  1. create base commits

  2. check that mktree barfs

  3. check that we can override mktree to create a broken tree

  4. clean up tip state

in which case you really want to have individual tests.

I do not care _that_ much, since mktree does not behave that way now,
and the setup is otherwise not that likely to fail.  But I do not think
more granularity actually hurts us, and it can sometimes help (as
described above, but also when something fails, the test output more
clearly pinpoints what happened).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] write_index: optionally allow broken null sha1s

2013-08-26 Thread Jeff King
On Sun, Aug 25, 2013 at 11:03:54PM -0700, Junio C Hamano wrote:

 Jonathan Nieder jrnie...@gmail.com writes:
 
  In other words, why not use something like this?
 
  write_index: optionally allow broken null sha1s
 
  Commit 4337b58 (do not write null sha1s to on-disk index, 2012-07-28)
  added a safety check preventing git from writing null sha1s into the
  index. The intent was to catch errors in other parts of the code that
  might let such an entry slip into the index (or worse, a tree).
 
  Some existing repositories have some invalid trees that contain null
  sha1s already, though.  Until 4337b58, a common way to clean this up
  would be to use git-filter-branch's index-filter to repair such broken
  entries.  That now fails when filter-branch tries to write out the
  index.
 
  Introduce a GIT_ALLOW_NULL_SHA1 environment variable to relax this check
  and make it easier to recover from such a history.
 
 I found this version more readable than Peff's (albeit slightly).

OK. Do you want to apply with Jonathan's wording, then?

There's one subtle thing I didn't mention in the it is already on stack
overflow. If you have a version of git which complains about the null
sha1, then the SO advice is already broken. But if the SO works, then
you do not have a version of git which complains. So why do you care?

And the answer is: you may be pushing to a remote with a version of git
that complains, and which has receive.fsckObjects set (and in many
cases, that remote is GitHub, since we have had that check on for a
while).

I don't know if it is worth spelling that out or not.

  After this patch, do you think (in a separate change) it would make
  sense for cache-tree.c::update_one() to check for null sha1 and error
  out unless GIT_ALLOW_NULL_SHA1 is true?  That would let us get rid of
  the caveat from the last paragraph.
 
 Hmm, interesting thought.

I think it is worth doing. The main reason I put the original check on
writing to the index is that it more clearly pinpoints the source of the
error. If we just died during git-write-tree, then you know somebody
broke your index, but you don't know which command.

But checking in both places would add extra protection, and would make
possible the relax on read, strict on write policy that filter-branch
wants to do.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] write_index: optionally allow broken null sha1s

2013-08-26 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 I found this version more readable than Peff's (albeit slightly).

 OK. Do you want to apply with Jonathan's wording, then?

I can do that, as it seems all of us are in agreement.

 There's one subtle thing I didn't mention in the it is already on stack
 overflow. If you have a version of git which complains about the null
 sha1, then the SO advice is already broken. But if the SO works, then
 you do not have a version of git which complains. So why do you care?

 And the answer is: you may be pushing to a remote with a version of git
 that complains, and which has receive.fsckObjects set (and in many
 cases, that remote is GitHub, since we have had that check on for a
 while).

 I don't know if it is worth spelling that out or not.

Probably not.

You could aim to correct each and every wrong suggestions on a site
where misguided leads other misguided, but it is a hopeless task.

  After this patch, do you think (in a separate change) it would make
  sense for cache-tree.c::update_one() to check for null sha1 and error
  out unless GIT_ALLOW_NULL_SHA1 is true?  That would let us get rid of
  the caveat from the last paragraph.
 
 Hmm, interesting thought.

 I think it is worth doing. The main reason I put the original check on
 writing to the index is that it more clearly pinpoints the source of the
 error. If we just died during git-write-tree, then you know somebody
 broke your index, but you don't know which command.

 But checking in both places would add extra protection, and would make
 possible the relax on read, strict on write policy that filter-branch
 wants to do.

Yeah, I agree with all of the above.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] write_index: optionally allow broken null sha1s

2013-08-26 Thread Jonathan Nieder
Jeff King wrote:
 On Sun, Aug 25, 2013 at 12:54:12PM -0700, Jonathan Nieder wrote:

 [setup split across three tests]

 This is kind of an old-fashioned test, since each step of the setup is
 treated as a separate test assertion.  I don't really mind until we
 get better automation to make it easy to skip or rearrange tests.
 Just for reference, I think the usual way to do this now is

 I don't see that splitting it up more hurts this. If we wanted more
 automatic rearranging or skipping of tests, we would need tests to
 declare dependencies on their setup. And we would need to be able to
 declare dependencies on multiple tests, since having a single setup test
 does not work in all cases (e.g., sometimes you are testing each step,
 and the final step relies on earlier steps).

Actually dependencies can already be inferred for most test scripts
using the following rule:

Each test depends on all tests with the word setup or the
phrase set up in their title that precede it.

And while there's no existing automation for testing that that's all
the tests rely on (by automatically skipping or reordering tests, or
running non-setup tests in separate sandboxes in parallel), in
practice it is still already useful since it makes it safe to use
GIT_SKIP_TESTS subject to the following constraint:

In each test script, for every setup test skipped, all later
tests are skipped as well.

I don't care as much about GIT_SKIP_TESTS as about being able to
introduce new tests in the middle of a file.

Of course splitting up the setup into 3 steps neither helps nor hurts
that.  What I was complaining about is splitting the filter-branch
from the verification that filter-branch had the right result.

Sorry for the confusion,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] write_index: optionally allow broken null sha1s

2013-08-26 Thread Jeff King
On Mon, Aug 26, 2013 at 10:35:01AM -0700, Jonathan Nieder wrote:

  I don't see that splitting it up more hurts this. If we wanted more
  automatic rearranging or skipping of tests, we would need tests to
  declare dependencies on their setup. And we would need to be able to
  declare dependencies on multiple tests, since having a single setup test
  does not work in all cases (e.g., sometimes you are testing each step,
  and the final step relies on earlier steps).
 
 Actually dependencies can already be inferred for most test scripts
 using the following rule:
 
   Each test depends on all tests with the word setup or the
   phrase set up in their title that precede it.

I'd be very surprised if this works in practice on most of our current
test scripts. There are often subtle dependencies on the state left over
from previous tests. Running the script below up through t3800 (at which
point I lost patience) reveals 37 test scripts that are broken. Which is
only about 17%, but we're clearly not quite there yet.

But sure, I agree that is something to strive for. But I think my point
stands; splitting up the setup doesn't hurt as long as you note all of
the tests as dependencies. And by your rule above, the advice would be
each of these tests should say setup in the description. That makes
sense to me, and I don't mind doing it here.

 Of course splitting up the setup into 3 steps neither helps nor hurts
 that.  What I was complaining about is splitting the filter-branch
 from the verification that filter-branch had the right result.

I totally missed your original point, then. I don't mind combining the
last two into a single test. That makes sense to me (I only split them
at all because I added the second one much later during the
development).

-Peff

-- 8 --
#!/usr/bin/perl
#
# Run as perl foo.pl t-basic.sh (or whatever script you want to
# check).

my $script = shift;
my ($script_num) = $script =~ /^(t\d+)/;

# run once to get the list of tests
my @tests = run_tests($script);

# mark some tests as setup tests that will always be run
foreach my $test (@tests) {
if ($test-{desc} =~ /set.?up/i) {
print STDERR marking $test-{num} - $test-{desc} as setup\n;
$test-{setup} = 1;
}
}

# now try each test in isolation, but including setup tests
foreach my $test (@tests) {
$ENV{GIT_SKIP_TESTS} = skip_all_except($script_num, $test, @tests);
run_tests($script) or die failed $test-{num} ($test-{desc})\n;
}

sub run_tests {
my @r;
open(my $fh, '-|', qw(sh), $script);
while ($fh) {
/^ok (\d+) - (.*)/ and push @r, { num = $1, desc = $2 };
}
$? and return ();
return @r;
}

sub skip_all_except {
my $prefix = shift;
my $want = shift;

return join(' ',
map { $prefix.$_-{num} }
grep { !$_-{setup}  $_-{num} != $want-{num} }
@_);
}
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] write_index: optionally allow broken null sha1s

2013-08-26 Thread Jeff King
On Mon, Aug 26, 2013 at 09:02:46AM -0700, Junio C Hamano wrote:

  There's one subtle thing I didn't mention in the it is already on stack
  overflow. If you have a version of git which complains about the null
  sha1, then the SO advice is already broken. But if the SO works, then
  you do not have a version of git which complains. So why do you care?
 
  And the answer is: you may be pushing to a remote with a version of git
  that complains, and which has receive.fsckObjects set (and in many
  cases, that remote is GitHub, since we have had that check on for a
  while).
 
  I don't know if it is worth spelling that out or not.
 
 Probably not.
 
 You could aim to correct each and every wrong suggestions on a site
 where misguided leads other misguided, but it is a hopeless task.

I do no think the advice on SO was misguided. My point was that a reader
of the revised commit message mentioning SO may wonder why would people
on SO suggest that? It does not work since we introduced the check in
the first place.

But if neither you nor Jonathan wondered about it, then it is probably
not worth caring.

   After this patch, do you think (in a separate change) it would make
   sense for cache-tree.c::update_one() to check for null sha1 and error
   out unless GIT_ALLOW_NULL_SHA1 is true?  That would let us get rid of
   the caveat from the last paragraph.
 [...]

Hmm. We are already complaining about the null sha1 in most cases,
because we do not have the object. It is only for submodules that the
bogus entry can make it into a tree.  So I tried this:

diff --git a/cache-tree.c b/cache-tree.c
index 0bbec43..0ec3923 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -343,7 +343,8 @@ static int update_one(struct cache_tree *it,
entlen = pathlen - baselen;
i++;
}
-   if (mode != S_IFGITLINK  !missing_ok  !has_sha1_file(sha1)) 
{
+   if ((mode != S_IFGITLINK  !missing_ok  !has_sha1_file(sha1))
+   || (mode == S_IFGITLINK  is_null_sha1(sha1))) {
strbuf_release(buffer);
return error(invalid object %06o %s for '%.*s',
mode, sha1_to_hex(sha1), entlen+baselen, path);
diff --git a/t/t7009-filter-branch-null-sha1.sh 
b/t/t7009-filter-branch-null-sha1.sh
index 50d5689..59c5edc 100755
--- a/t/t7009-filter-branch-null-sha1.sh
+++ b/t/t7009-filter-branch-null-sha1.sh
@@ -31,10 +31,10 @@ test_expect_success 'filter commands are still checked' '
git commit -a -m back to normal
 '
 
-test_expect_success 'filter commands are still checked' '
+test_expect_success 'complain about trees that are broken after filtering' '
test_must_fail git filter-branch \
--force --prune-empty \
-   --index-filter git rm --cached --ignore-unmatch three.t
+   --index-filter 
 '
 
 test_expect_success 'removing the broken entry works' '

But it doesn't work. The reason is that we do not need to re-make the
cache-tree, because we are just handing back the crap that we got from
the original tree. And I do not think we would want to invalidate the
whole cache tree just on the off chance that it has a problem;
filter-branch is slow enough as it is.

So I think we are better off to let filter-branch run at full speed,
even if optimizations like cache-tree mean we are missing out on
validating all of the data.  If you want to know whether your repo is in
good shape, you should run `git fsck`.

-Peff

PS I got more information from the user who originally experienced this
   problem (and yes, it was a submodule entry). It turns out that he had
   done some unusual stuff with having a git repository inside his Eclipse
   project, and adding it as a sub-project of his build.  So it was
   almost certainly EGit/JGit which wrote the bad entry in the first
   place (and he is making a bug report to EGit folks).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] write_index: optionally allow broken null sha1s

2013-08-26 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 I'd be very surprised if this works in practice on most of our current
 test scripts. There are often subtle dependencies on the state left over
 from previous tests. Running the script below up through t3800 (at which
 point I lost patience) reveals 37 test scripts that are broken. Which is
 only about 17%, but we're clearly not quite there yet.
 ...

Yes, I agree that in an ideal world with infinite amount of time, we
should strive to make sure that each test is independent from
previous steps, and if we do not have infinite amount of time, we
should make sure we find some (say 5%) time to various code clean-up
effort, in both writing and reviewing such, including tests.

The attached is cute.  Thanks for a food for thought.

 -- 8 --
 #!/usr/bin/perl
 #
 # Run as perl foo.pl t-basic.sh (or whatever script you want to
 # check).

 my $script = shift;
 my ($script_num) = $script =~ /^(t\d+)/;

 # run once to get the list of tests
 my @tests = run_tests($script);

 # mark some tests as setup tests that will always be run
 foreach my $test (@tests) {
   if ($test-{desc} =~ /set.?up/i) {
   print STDERR marking $test-{num} - $test-{desc} as setup\n;
   $test-{setup} = 1;
   }
 }

 # now try each test in isolation, but including setup tests
 foreach my $test (@tests) {
   $ENV{GIT_SKIP_TESTS} = skip_all_except($script_num, $test, @tests);
   run_tests($script) or die failed $test-{num} ($test-{desc})\n;
 }

 sub run_tests {
   my @r;
   open(my $fh, '-|', qw(sh), $script);
   while ($fh) {
   /^ok (\d+) - (.*)/ and push @r, { num = $1, desc = $2 };
   }
   $? and return ();
   return @r;
 }

 sub skip_all_except {
   my $prefix = shift;
   my $want = shift;

   return join(' ',
   map { $prefix.$_-{num} }
   grep { !$_-{setup}  $_-{num} != $want-{num} }
   @_);
 }
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] write_index: optionally allow broken null sha1s

2013-08-25 Thread Jeff King
On Sat, Aug 24, 2013 at 11:15:00PM -0700, Jonathan Nieder wrote:

  I was tempted to not involve filter-branch in this commit at all, and
  instead require the user to manually invoke
 
GIT_ALLOW_NULL_SHA1=1 git filter-branch ...
 
  to perform such a filter.  That would be slightly safer, but requires
  some specialized knowledge from the user (and advice on using
  filter-branch to remove such entries already exists on places like
  stackoverflow, and this patch makes it Just Work on recent versions of
  git).
 
 The above few paragraphs explained the most mysterious part of the
 patch to me.  I think they would be fine to include in the commit
 message.

I rolled them into the commit message.

 To avoid pipes involving git commands, since they can losing the exit
 status (and hence information about whether git crashed):
 [...]

I didn't bother, as we do not expect a simple ls-tree to fail (and we
would notice anyway as we check that we have later introduced a broken
tree state). Still, the resulting code is not any harder to read (which
was my main hesitation), so let's do it as you suggest.

-- 8 --
Subject: write_index: optionally allow broken null sha1s

Commit 4337b58 (do not write null sha1s to on-disk index,
2012-07-28) unconditionally prevents git from writing null
sha1s into the index. The intent was to catch errors in
other parts of the code that might let such an entry slip
into the index (or worse, a tree).

However, some repositories have history in which the trees
contain entries with null sha1s. Because the restriction is
unconditional, it can be difficult to work with these older
histories (for example, you cannot even check out the
history, because you are forbidden to write the broken
index). Worst of all, you cannot use git-filter-branch's
index-filter to repair such a broken history, because
filter-branch will try to write an index for each tree.

We could potentially work around this by using a
commit-filter, and munging the tree manually. However,
filter-branch unconditionally writes the index, even if we
only asked for a commit-filter, so this doesn't work. That
would be possible to fix, but figuring out that a
commit-filter is needed (and what it should contain) is
somewhat non-intuitive.

Instead, let's introduce an environment variable for
relaxing this check. We could stop there, and require that
users rewriting such broken history run:

  GIT_ALLOW_NULL_SHA=1 git filter-branch ...

themselves. This is the safest option, as it means
filter-branch will never successfully run on history that
contains null sha1s. However, it is also inconvenient for
the user, as they need to figure out the magic ALLOW_SHA1
variable.

Instead, let's have filter-branch turn on the variable
automatically, but only when checkout out the to-be-filtered
index. This means that further filter commands that touch
the index will still notice and fail, unless they actually
remove the broken entry.

We would still succeed on a filter-branch whose filters do
not touch the index at all (since we complain of the null
sha1 only on writing, not when making a tree out of the
index). This is acceptable, though, as we still print a loud
warning (in addition to warning of the problem via fsck), so
the problem is unlikely to go unnoticed.

Signed-off-by: Jeff King p...@peff.net
---
 git-filter-branch.sh   |  5 ++--
 read-cache.c   | 13 --
 t/t7009-filter-branch-null-sha1.sh | 52 ++
 3 files changed, 66 insertions(+), 4 deletions(-)
 create mode 100755 t/t7009-filter-branch-null-sha1.sh

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index ac2a005..98e8fe4 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -283,11 +283,12 @@ while read commit parents; do
 
case $filter_subdir in
)
-   git read-tree -i -m $commit
+   GIT_ALLOW_NULL_SHA1=1 git read-tree -i -m $commit
;;
*)
# The commit may not have the subdirectory at all
-   err=$(git read-tree -i -m $commit:$filter_subdir 21) || {
+   err=$(GIT_ALLOW_NULL_SHA1=1 \
+ git read-tree -i -m $commit:$filter_subdir 21) || {
if ! git rev-parse -q --verify $commit:$filter_subdir
then
rm -f $GIT_INDEX_FILE
diff --git a/read-cache.c b/read-cache.c
index c3d5e35..83a7414 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1817,8 +1817,17 @@ int write_index(struct index_state *istate, int newfd)
continue;
if (!ce_uptodate(ce)  is_racy_timestamp(istate, ce))
ce_smudge_racily_clean_entry(ce);
-   if (is_null_sha1(ce-sha1))
-   return error(cache entry has null sha1: %s, ce-name);
+   if (is_null_sha1(ce-sha1)) {
+   static const char msg[] = cache entry has 

Re: [PATCHv2] write_index: optionally allow broken null sha1s

2013-08-25 Thread Jonathan Nieder
On Sun, Aug 25, 2013 at 05:58:18AM -0400, Jeff King wrote:
 On Sat, Aug 24, 2013 at 11:15:00PM -0700, Jonathan Nieder wrote:

 I was tempted to not involve filter-branch in this commit at all, and
 instead require the user to manually invoke

   GIT_ALLOW_NULL_SHA1=1 git filter-branch ...

 to perform such a filter.  That would be slightly safer, but requires
 some specialized knowledge from the user (and advice on using
 filter-branch to remove such entries already exists on places like
 stackoverflow, and this patch makes it Just Work on recent versions of
 git).

 The above few paragraphs explained the most mysterious part of the
 patch to me.  I think they would be fine to include in the commit
 message.

 I rolled them into the commit message.

Hm --- the most useful part was advice on using filter-branch to
remove such entries already exists on places like stackoverflow,
which was dropped.  From my point of view, that is exactly the
motivation of the patch.

I also found the I was tempted to ... That would be slightly safer,
but requires ... structure easier to read.

In other words, why not use something like this?

write_index: optionally allow broken null sha1s

Commit 4337b58 (do not write null sha1s to on-disk index, 2012-07-28)
added a safety check preventing git from writing null sha1s into the
index. The intent was to catch errors in other parts of the code that
might let such an entry slip into the index (or worse, a tree).

Some existing repositories have some invalid trees that contain null
sha1s already, though.  Until 4337b58, a common way to clean this up
would be to use git-filter-branch's index-filter to repair such broken
entries.  That now fails when filter-branch tries to write out the
index.

Introduce a GIT_ALLOW_NULL_SHA1 environment variable to relax this check
and make it easier to recover from such a history.

It is tempting to not involve filter-branch in this commit at all, and
instead require the user to manually invoke

GIT_ALLOW_NULL_SHA1=1 git filter-branch ...

to perform an index-filter on a history with trees with null sha1s.
That would be slightly safer, but requires some specialized knowledge
from the user.  So let's set the GIT_ALLOW_NULL_SHA1 variable
automatically when checking out the to-be-filtered trees.  Advice on
using filter-branch to remove such entries already exists on places like
stackoverflow, and this patch makes it Just Work again on recent
versions of git.

Further commands that touch the index will still notice and fail,   
unless
they actually remove the broken entries.  A filter-branch whose filters
do not touch the index at all will not error out (since we complain of
the null sha1 only on writing, not when making a tree out of the index),
but this is acceptable, as we still print a loud warning, so the problem
is unlikely to go unnoticed.

With or without such a change,
Reviewed-by: Jonathan Nieder jrnie...@gmail.com

After this patch, do you think (in a separate change) it would make
sense for cache-tree.c::update_one() to check for null sha1 and error
out unless GIT_ALLOW_NULL_SHA1 is true?  That would let us get rid of
the caveat from the last paragraph.

[...]
 --- /dev/null
 +++ b/t/t7009-filter-branch-null-sha1.sh
 @@ -0,0 +1,52 @@
 +#!/bin/sh
 +
 +test_description='filter-branch removal of trees with null sha1'
 +. ./test-lib.sh
 +
 +test_expect_success 'create base commits' '
 + test_commit one 
 + test_commit two 
 + test_commit three
 +'
 +
 +test_expect_success 'create a commit with a bogus null sha1 in the tree' '
 + {
 + git ls-tree HEAD 
 + printf 16 commit $_z40\\tbroken\\n
 + } broken-tree
 + echo add broken entry msg 
 +
 + tree=$(git mktree broken-tree) 
 + test_tick 
 + commit=$(git commit-tree $tree -p HEAD msg) 
 + git update-ref HEAD $commit
 +'
 +
 +# we have to make one more commit on top removing the broken
 +# entry, since otherwise our index does not match HEAD (and filter-branch 
 will
 +# complain). We could make the index match HEAD, but doing so would involve
 +# writing a null sha1 into the index.
 +test_expect_success 'create a commit dropping the broken entry' '
 + test_tick 
 + git commit -a -m back to normal
 +'

This is kind of an old-fashioned test, since each step of the setup is
treated as a separate test assertion.  I don't really mind until we
get better automation to make it easy to skip or rearrange tests.
Just for reference, I think the usual way to do this now is

test_expect_success 'setup' '
# create base commits
...

# create a commit with bogus null sha1 in the tree
...

# We have to make one more