Re: [RFC PATCH 2/4] fsck: support refs pointing to lazy objects

2017-07-28 Thread Ben Peart



On 7/27/2017 7:50 PM, Jonathan Tan wrote:

On Thu, 27 Jul 2017 11:59:47 -0700
Junio C Hamano  wrote:


@@ -438,6 +438,14 @@ static int fsck_handle_ref(const char *refname, const 
struct object_id *oid,
  
  	obj = parse_object(oid);

if (!obj) {
+   if (repository_format_lazy_object) {
+   /*
+* Increment default_refs anyway, because this is a
+* valid ref.
+*/
+   default_refs++;
+   return 0;
+   }
error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;


At this point, do we know (or can we tell) if this is a missing
object or a file exists as a loose object but is corrupt?  If we
could, it would be nice to do this only for the former to avoid
sweeping a real corruption that is unrelated to the lazy fetch under
the rug.


Before all this is run, there is a check over all loose and packed
objects and I've verified that this check reports failure in
corrupt-object situations (see test below).

It is true that parse_object() cannot report the difference, but since
fsck has already verified non-corruptness, I don't think it needs to
know the difference at this point.


+test_expect_success 'fsck fails on lazy object pointed to by ref' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo 1 &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+
+   # Reference $A only from ref, and delete it
+   git -C repo branch mybranch "$A" &&
+   delete_object repo "$A" &&
+
+   test_must_fail git -C repo fsck
+'


And a new test that uses a helper different from delete_object
(perhaps call it corrupt_object?) can be used to make sure that we
complain in that case here.


I agree that object corruption can cause this specific part of the
production code to falsely work. But I think that this specific part of
the code can and should rely on object corruption being checked
elsewhere. (I usually don't like to assume that other components work
and will continue to work, but in this case, I think that fsck checking
for object corruption is very foundational and should be relied upon.)

But if we think that defense "in depth" is a good idea, I have no
problem adding such tests (like the one below).



It's good to know that object corruption is already being checked 
elsewhere in fsck.  I agree that you should be able to rely that as long 
as it is guaranteed to run.  I'd rather see a single location in the 
code that has the logic to detect corrupted objects rather than two 
copies (or worse, two different versions).


Are there also tests for validating the object corruption detection 
code?  If not, adding some (like below) would be great.




---
delete_object () {
rm $1/.git/objects/$(echo $2 | cut -c1-2)/$(echo $2 | cut -c3-40)
}

corrupt_object () {
chmod a+w $1/.git/objects/$(echo $2 | cut -c1-2)/$(echo $2 | cut -c3-40) 
&&
echo CORRUPT >$1/.git/objects/$(echo $2 | cut -c1-2)/$(echo $2 | cut 
-c3-40)
}

setup_object_in_reflog () {
rm -rf repo &&
test_create_repo repo &&
test_commit -C repo 1 &&

A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
B=$(git -C repo commit-tree -m b HEAD^{tree}) &&

# Reference $A only from reflog
git -C repo branch mybranch "$A" &&
git -C repo branch -f mybranch "$B"
}

test_expect_success 'lazy object in reflog' '
setup_object_in_reflog &&
delete_object repo "$A" &&
test_must_fail git -C repo fsck &&
git -C repo config core.repositoryformatversion 1 &&
git -C repo config extensions.lazyobject "arbitrary string" &&
git -C repo fsck
'

test_expect_success 'corrupt loose object in reflog' '
setup_object_in_reflog &&
corrupt_object repo "$A" &&
test_must_fail git -C repo fsck &&
git -C repo config core.repositoryformatversion 1 &&
git -C repo config extensions.lazyobject "arbitrary string" &&
test_must_fail git -C repo fsck
'

test_expect_success 'missing packed object in reflog' '
setup_object_in_reflog &&
git -C repo repack -a &&
delete_object repo "$A" &&
chmod a+w repo/.git/objects/pack/*.pack &&
echo CORRUPT >"$(echo repo/.git/objects/pack/*.pack)" &&
test_must_fail git -C repo fsck &&
git -C repo config core.repositoryformatversion 1 &&
git -C repo config extensions.lazyobject "arbitrary string" &&
test_must_fail git -C repo fsck
'



Re: [RFC PATCH 2/4] fsck: support refs pointing to lazy objects

2017-07-27 Thread Jonathan Tan
On Thu, 27 Jul 2017 11:59:47 -0700
Junio C Hamano  wrote:

> > @@ -438,6 +438,14 @@ static int fsck_handle_ref(const char *refname, const 
> > struct object_id *oid,
> >  
> > obj = parse_object(oid);
> > if (!obj) {
> > +   if (repository_format_lazy_object) {
> > +   /*
> > +* Increment default_refs anyway, because this is a
> > +* valid ref.
> > +*/
> > +   default_refs++;
> > +   return 0;
> > +   }
> > error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
> > errors_found |= ERROR_REACHABLE;
> 
> At this point, do we know (or can we tell) if this is a missing
> object or a file exists as a loose object but is corrupt?  If we
> could, it would be nice to do this only for the former to avoid
> sweeping a real corruption that is unrelated to the lazy fetch under
> the rug.

Before all this is run, there is a check over all loose and packed
objects and I've verified that this check reports failure in
corrupt-object situations (see test below).

It is true that parse_object() cannot report the difference, but since
fsck has already verified non-corruptness, I don't think it needs to
know the difference at this point.

> > +test_expect_success 'fsck fails on lazy object pointed to by ref' '
> > +   rm -rf repo &&
> > +   test_create_repo repo &&
> > +   test_commit -C repo 1 &&
> > +
> > +   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
> > +
> > +   # Reference $A only from ref, and delete it
> > +   git -C repo branch mybranch "$A" &&
> > +   delete_object repo "$A" &&
> > +
> > +   test_must_fail git -C repo fsck
> > +'
> 
> And a new test that uses a helper different from delete_object
> (perhaps call it corrupt_object?) can be used to make sure that we
> complain in that case here.

I agree that object corruption can cause this specific part of the
production code to falsely work. But I think that this specific part of
the code can and should rely on object corruption being checked
elsewhere. (I usually don't like to assume that other components work
and will continue to work, but in this case, I think that fsck checking
for object corruption is very foundational and should be relied upon.)

But if we think that defense "in depth" is a good idea, I have no
problem adding such tests (like the one below).

---
delete_object () {
rm $1/.git/objects/$(echo $2 | cut -c1-2)/$(echo $2 | cut -c3-40)
}

corrupt_object () {
chmod a+w $1/.git/objects/$(echo $2 | cut -c1-2)/$(echo $2 | cut 
-c3-40) &&
echo CORRUPT >$1/.git/objects/$(echo $2 | cut -c1-2)/$(echo $2 | cut 
-c3-40)
}

setup_object_in_reflog () {
rm -rf repo &&
test_create_repo repo &&
test_commit -C repo 1 &&

A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
B=$(git -C repo commit-tree -m b HEAD^{tree}) &&

# Reference $A only from reflog
git -C repo branch mybranch "$A" &&
git -C repo branch -f mybranch "$B"
}

test_expect_success 'lazy object in reflog' '
setup_object_in_reflog &&
delete_object repo "$A" &&
test_must_fail git -C repo fsck &&
git -C repo config core.repositoryformatversion 1 &&
git -C repo config extensions.lazyobject "arbitrary string" &&
git -C repo fsck
'

test_expect_success 'corrupt loose object in reflog' '
setup_object_in_reflog &&
corrupt_object repo "$A" &&
test_must_fail git -C repo fsck &&
git -C repo config core.repositoryformatversion 1 &&
git -C repo config extensions.lazyobject "arbitrary string" &&
test_must_fail git -C repo fsck
'

test_expect_success 'missing packed object in reflog' '
setup_object_in_reflog &&
git -C repo repack -a &&
delete_object repo "$A" &&
chmod a+w repo/.git/objects/pack/*.pack &&
echo CORRUPT >"$(echo repo/.git/objects/pack/*.pack)" &&
test_must_fail git -C repo fsck &&
git -C repo config core.repositoryformatversion 1 &&
git -C repo config extensions.lazyobject "arbitrary string" &&
test_must_fail git -C repo fsck
'


Re: [RFC PATCH 2/4] fsck: support refs pointing to lazy objects

2017-07-27 Thread Junio C Hamano
Jonathan Tan  writes:

> Teach fsck to not treat refs with missing targets as an error when
> extensions.lazyobject is set.
>
> For the purposes of warning about no default refs, such refs are still
> treated as legitimate refs.
>
> Signed-off-by: Jonathan Tan 
> ---
>  builtin/fsck.c |  8 
>  t/t0410-lazy-object.sh | 20 
>  2 files changed, 28 insertions(+)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 1cfb8d98c..e29ff760b 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -438,6 +438,14 @@ static int fsck_handle_ref(const char *refname, const 
> struct object_id *oid,
>  
>   obj = parse_object(oid);
>   if (!obj) {
> + if (repository_format_lazy_object) {
> + /*
> +  * Increment default_refs anyway, because this is a
> +  * valid ref.
> +  */
> + default_refs++;
> + return 0;
> + }
>   error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
>   errors_found |= ERROR_REACHABLE;

At this point, do we know (or can we tell) if this is a missing
object or a file exists as a loose object but is corrupt?  If we
could, it would be nice to do this only for the former to avoid
sweeping a real corruption that is unrelated to the lazy fetch under
the rug.

> +test_expect_success 'fsck fails on lazy object pointed to by ref' '
> + rm -rf repo &&
> + test_create_repo repo &&
> + test_commit -C repo 1 &&
> +
> + A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
> +
> + # Reference $A only from ref, and delete it
> + git -C repo branch mybranch "$A" &&
> + delete_object repo "$A" &&
> +
> + test_must_fail git -C repo fsck
> +'

And a new test that uses a helper different from delete_object
(perhaps call it corrupt_object?) can be used to make sure that we
complain in that case here.

> +test_expect_success '...but succeeds if lazyobject is set' '
> + git -C repo config core.repositoryformatversion 1 &&
> + git -C repo config extensions.lazyobject "arbitrary string" &&
> + git -C repo fsck
> +'
> +
>  test_done


[RFC PATCH 2/4] fsck: support refs pointing to lazy objects

2017-07-26 Thread Jonathan Tan
Teach fsck to not treat refs with missing targets as an error when
extensions.lazyobject is set.

For the purposes of warning about no default refs, such refs are still
treated as legitimate refs.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c |  8 
 t/t0410-lazy-object.sh | 20 
 2 files changed, 28 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 1cfb8d98c..e29ff760b 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -438,6 +438,14 @@ static int fsck_handle_ref(const char *refname, const 
struct object_id *oid,
 
obj = parse_object(oid);
if (!obj) {
+   if (repository_format_lazy_object) {
+   /*
+* Increment default_refs anyway, because this is a
+* valid ref.
+*/
+   default_refs++;
+   return 0;
+   }
error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
/* We'll continue with the rest despite the error.. */
diff --git a/t/t0410-lazy-object.sh b/t/t0410-lazy-object.sh
index 36442531f..00e1b4a88 100755
--- a/t/t0410-lazy-object.sh
+++ b/t/t0410-lazy-object.sh
@@ -29,4 +29,24 @@ test_expect_success '...but succeeds if lazyobject is set' '
git -C repo fsck
 '
 
+test_expect_success 'fsck fails on lazy object pointed to by ref' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo 1 &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+
+   # Reference $A only from ref, and delete it
+   git -C repo branch mybranch "$A" &&
+   delete_object repo "$A" &&
+
+   test_must_fail git -C repo fsck
+'
+
+test_expect_success '...but succeeds if lazyobject is set' '
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.lazyobject "arbitrary string" &&
+   git -C repo fsck
+'
+
 test_done
-- 
2.14.0.rc0.400.g1c36432dff-goog