Re: [RFC PATCH 3/4] fsck: support referenced lazy objects

2017-07-29 Thread Junio C Hamano
Jonathan Tan  writes:

> Teach fsck to not treat missing objects indirectly pointed to by refs as
> an error when extensions.lazyobject is set.

I forgot to mention a potential flaw in this approach in my previous
message.

If you are a pure sightseer, then this is perfectly fine.  The
object store in your local Git client working in that mode is purely
a cache, lazily populated while browsing the object store backed by
the source of what lazy-object "hook" talks with.  As long as that
cache does not give us a corrupt object, we are OK, because missing
objects do not matter.

But once you start using the repository as more than a sightseer,
you will have objects that only exist in your local "cache" and are
not yet in that backing store behind the lazy-object "hook".  You
need to notice it when any of them goes corrupt or missing, or your
next "git push" to send them over to a remote location will fail by
definition because you are the only one with these objects.

If we had the "promise" thing, then we could say that it is OK if
traversal terminated at a "promised but not fetched yet" boundary,
but we cannot afford the "promise", and more importantly, I do not
think "promise" has to be the only approach to ensure that the
objects that exist only in the local repository are all connected.

For example, if we know that the remote 'origin' is the actual
backing store lazy-object "hook" talks with, a validation rule to
ensure that we haven't lost any local commit is to ensure that a
traversal from our local branch tips down to remote-tracking
branches taken from 'origin' must not hit _any_ missing commit.

That covers only the commit objects.  I do not know offhand if we
can and how we extend this concept to protect the tags, trees and
blobs we have locally generated and haven't pushed out, but you and
Ben hopefully can come up with ways to cover them.





Re: [RFC PATCH 3/4] fsck: support referenced lazy objects

2017-07-27 Thread Jonathan Tan
On Thu, 27 Jul 2017 12:17:37 -0700
Junio C Hamano  wrote:

> The same comment as 2/4 applies here.

Noted - whatever the resolution is, I'll apply it to all the patches.
> 
> > @@ -212,6 +221,8 @@ static void check_reachable_object(struct object *obj)
> >  * do a full fsck
> >  */
> > if (!(obj->flags & HAS_OBJ)) {
> > +   if (repository_format_lazy_object)
> > +   return;
> > if (has_sha1_pack(obj->oid.hash))
> > return; /* it is in pack - forget about it */
> > printf("missing %s %s\n", printable_type(obj),
> 
> Also this reminds as a related issue.  Imagine:
> 
>  - An object X was once retrieved, perhaps but not necessarily
>lazily, together with another object Y that is referred to by X
>(e.g. X is a tree, Y is a blob in the directory at path D, which
>is represented by X).
> 
>  - The same blob Y is added to the index in a different directory at
>path E.
> 
>  - The user decides to make this a slimmed-down "narrow clone" style
>repository and tells Git that path D is not interesting.  We lose
>X, but not Y because Y is still referenced from the index.
> 
>  - "git reset --hard" happens, and there no longer is any reference
>to Y.
> 
> Now, when we run fsck, should we diagnose Y as "unreachable and/or
> dangling"?

I would say yes (or whatever happens in the case where we re-fetch into
a shallow clone).

Come to think of it..."git reset --hard" always has the potential to
create unreachable objects, right (regardless of whether it's a "shallow
clone" or "narrow clone" or ordinary clone)?


Re: [RFC PATCH 3/4] fsck: support referenced lazy objects

2017-07-27 Thread Junio C Hamano
Jonathan Tan  writes:

> Teach fsck to not treat missing objects indirectly pointed to by refs as
> an error when extensions.lazyobject is set.
>
> Signed-off-by: Jonathan Tan 
> ---
>  builtin/fsck.c | 11 +++
>  t/t0410-lazy-object.sh | 27 +++
>  2 files changed, 38 insertions(+)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index e29ff760b..238532cc2 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -149,6 +149,15 @@ static int mark_object(struct object *obj, int type, 
> void *data, struct fsck_opt
>   return 0;
>   obj->flags |= REACHABLE;
>   if (!(obj->flags & HAS_OBJ)) {
> + if (repository_format_lazy_object)
> + /*
> +  * Return immediately; this is not an error, and further
> +  * recursion does not need to be performed on this
> +  * object since it is missing (so it does not need to be
> +  * added to "pending").
> +  */
> + return 0;
> +

The same comment as 2/4 applies here.

> @@ -212,6 +221,8 @@ static void check_reachable_object(struct object *obj)
>* do a full fsck
>*/
>   if (!(obj->flags & HAS_OBJ)) {
> + if (repository_format_lazy_object)
> + return;
>   if (has_sha1_pack(obj->oid.hash))
>   return; /* it is in pack - forget about it */
>   printf("missing %s %s\n", printable_type(obj),

Also this reminds as a related issue.  Imagine:

 - An object X was once retrieved, perhaps but not necessarily
   lazily, together with another object Y that is referred to by X
   (e.g. X is a tree, Y is a blob in the directory at path D, which
   is represented by X).

 - The same blob Y is added to the index in a different directory at
   path E.

 - The user decides to make this a slimmed-down "narrow clone" style
   repository and tells Git that path D is not interesting.  We lose
   X, but not Y because Y is still referenced from the index.

 - "git reset --hard" happens, and there no longer is any reference
   to Y.

Now, when we run fsck, should we diagnose Y as "unreachable and/or
dangling"?