Re: [PATCH 2/2] unpack_entry: do not die when we fail to apply a delta

2013-06-14 Thread Jeff King
On Fri, Jun 14, 2013 at 02:59:00PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >  test_expect_success \
> > +'corruption of delta base reference pointing to wrong object' \
> > +'create_new_pack --delta-base-offset &&
> > + git prune-packed &&
> > + printf "\220\033" | do_corrupt_object $blob_3 2 &&
> 
> Interesting.  You cheated in a different way with a hardcoded
> offset, instead of hardcoded knowledge of where the object name
> is stored in binary in the .idx file ;-)

Yes. We could get it with:

  git show-index <"$pack.idx" |
  cut -d' ' -f1 |
  perl -e '
@pos = map { chomp; $_ } <>;
my $ofs = $pos[2] - $pos[0];

my @bin;
unshift @bin, $ofs & 127;
while ($ofs >>= 7) {
  $ofs--;
  unshift @bin, 128 | ($ofs & 127);
}

binmode STDOUT;
print chr for @bin;
  '

if that's not too ugly. Maybe the REF_DELTA one is less ugly, then, as
it would not need to do the packed offset encoding, but just convert
$blob1 from hex into binary.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] unpack_entry: do not die when we fail to apply a delta

2013-06-14 Thread Junio C Hamano
Jeff King  writes:

>  test_expect_success \
> +'corruption of delta base reference pointing to wrong object' \
> +'create_new_pack --delta-base-offset &&
> + git prune-packed &&
> + printf "\220\033" | do_corrupt_object $blob_3 2 &&

Interesting.  You cheated in a different way with a hardcoded
offset, instead of hardcoded knowledge of where the object name
is stored in binary in the .idx file ;-)

> + git cat-file blob $blob_1 >/dev/null &&
> + git cat-file blob $blob_2 >/dev/null &&
> + test_must_fail git cat-file blob $blob_3 >/dev/null'
> +
> +test_expect_success \
> +'... but having a loose copy allows for full recovery' \
> +'mv ${pack}.idx tmp &&
> + git hash-object -t blob -w file_3 &&
> + mv tmp ${pack}.idx &&
> + git cat-file blob $blob_1 > /dev/null &&
> + git cat-file blob $blob_2 > /dev/null &&
> + git cat-file blob $blob_3 > /dev/null'
> +
> +test_expect_success \
> +'... and then a repack "clears" the corruption' \
> +'do_repack --delta-base-offset --no-reuse-delta &&
> + git prune-packed &&
> + git verify-pack ${pack}.pack &&
> + git cat-file blob $blob_1 > /dev/null &&
> + git cat-file blob $blob_2 > /dev/null &&
> + git cat-file blob $blob_3 > /dev/null'

Nice.  Will replace the one I queued yesterday with these two patches.

> +test_expect_success \
>  'corrupting header to have too small output buffer fails unpack' \
>  'create_new_pack &&
>   git prune-packed &&
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] unpack_entry: do not die when we fail to apply a delta

2013-06-14 Thread Jeff King
When we try to load an object from disk and fail, our
general strategy is to see if we can get it from somewhere
else (e.g., a loose object). That lets users fix corruption
problems by copying known-good versions of objects into the
object database.

We already handle the case where we were not able to read
the delta from disk. However, when we find that the delta we
read does not apply, we simply die.  This case is harder to
trigger, as corruption in the delta data itself would
trigger a crc error from zlib.  However, a corruption that
pointed us at the wrong delta base might cause it.

We can do the same "fail and try to find the object
elsewhere" trick instead of dying. This not only gives us a
chance to recover, but also puts us on code paths that will
alert the user to the problem (with the current message,
they do not even know which sha1 caused the problem).

Note that unlike some other pack corruptions, we do not
recover automatically from this case when doing a repack.
There is nothing apparently wrong with the delta, as it
points to a valid, accessible object, and we realize the
error only when the resulting size does not match up. And in
theory, one could even have a case where the corrupted size
is the same, and the problem would only be noticed by
recomputing the sha1.

We can get around this by recomputing the deltas with
--no-reuse-delta, which our test does (and this is probably
good advice for anyone recovering from pack corruption).

Signed-off-by: Jeff King 
---
I don't know if it is worth showing that pack-objects without
"--no-reuse-delta" does not help this case with a test_expect_failure. I
don't have plans to work on it, and I'm not sure that it is a big deal.

 sha1_file.c   | 11 ++-
 t/t5303-pack-corruption-resilience.sh | 27 +++
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 5c08701..d458708 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2135,8 +2135,17 @@ void *unpack_entry(struct packed_git *p, off_t 
obj_offset,
data = patch_delta(base, base_size,
   delta_data, delta_size,
   &size);
+
+   /*
+* We could not apply the delta; warn the user, but keep going.
+* Our failure will be noticed either in the next iteration of
+* the loop, or if this is the final delta, in the caller when
+* we return NULL. Those code paths will take care of making
+* a more explicit warning and retrying with another copy of
+* the object.
+*/
if (!data)
-   die("failed to apply delta");
+   error("failed to apply delta");
 
free(delta_data);
}
diff --git a/t/t5303-pack-corruption-resilience.sh 
b/t/t5303-pack-corruption-resilience.sh
index 9cb8172..35926de 100755
--- a/t/t5303-pack-corruption-resilience.sh
+++ b/t/t5303-pack-corruption-resilience.sh
@@ -276,6 +276,33 @@ test_expect_success \
  git cat-file blob $blob_3 > /dev/null'
 
 test_expect_success \
+'corruption of delta base reference pointing to wrong object' \
+'create_new_pack --delta-base-offset &&
+ git prune-packed &&
+ printf "\220\033" | do_corrupt_object $blob_3 2 &&
+ git cat-file blob $blob_1 >/dev/null &&
+ git cat-file blob $blob_2 >/dev/null &&
+ test_must_fail git cat-file blob $blob_3 >/dev/null'
+
+test_expect_success \
+'... but having a loose copy allows for full recovery' \
+'mv ${pack}.idx tmp &&
+ git hash-object -t blob -w file_3 &&
+ mv tmp ${pack}.idx &&
+ git cat-file blob $blob_1 > /dev/null &&
+ git cat-file blob $blob_2 > /dev/null &&
+ git cat-file blob $blob_3 > /dev/null'
+
+test_expect_success \
+'... and then a repack "clears" the corruption' \
+'do_repack --delta-base-offset --no-reuse-delta &&
+ git prune-packed &&
+ git verify-pack ${pack}.pack &&
+ git cat-file blob $blob_1 > /dev/null &&
+ git cat-file blob $blob_2 > /dev/null &&
+ git cat-file blob $blob_3 > /dev/null'
+
+test_expect_success \
 'corrupting header to have too small output buffer fails unpack' \
 'create_new_pack &&
  git prune-packed &&
-- 
1.8.3.rc2.14.g7eee6b3
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html