Re: [PATCH v2 3/5] rev-list: handle missing tree objects properly

2018-08-14 Thread Jonathan Tan
> I don't understand about the ">= 0". What should I replace it with?
> Maybe you mean the return is never positive so I can change:
> 
> parse_tree_gently(tree, 1) >= 0
> 
> to:
> !parse_tree_gently(tree, 1)
> 
> ?

Sorry for the lack of clarity - that is what I meant.

> > The missing mechanism (for error, allow-any, print) should work without
> > needing to consult whether an object is a promisor object or not - it
> > should just print whatever is missing, so the "if
> > (!is_promisor_object..." line looks out of place.
> Done. I considered that a missing object which is not a promisor is a
> serious error, so I had it die here.

It is a serious error, but as far as I can tell, that is what the
--missing flags are supposed to help diagnose (so we can't die since we
need the diagnoses to be printed). See, for example, 'rev-list W/
--missing=print' in t6112 - the "r1" repository does not have partial
clone enabled (which I verified by inserting a test_pause then cat-ting
r1/.git/config), but nothing dies.

> But now that I've added the
> do_not_die_on_missing_tree flag, it's more natural to keep the
> previous promisor check as-is.

OK, I'll take a look once you send out v4.

> Also, is_promisor_object is an
> expensive check, and it would be better to skip it during the common
> execution path (which should be when exclude_promisor_objects, an
> internal-use-only flag, is *not* set, which means we never call
> is_promisor_object.

That's true.

> > In my original review [1], I suggested that we always show a tree if we
> > have its hash - if we don't have the object, we just recurse into it.
> > This would be the same as your patch, except that the 'die("bad tree
> > object...' is totally removed instead of merely moved. I still think
> > this solution has some merit - all the tests still pass (except that we
> > need to check for "unable to read" instead of "bad tree object" in error
> > messages), but I just realized that it might still be backwards
> > incompatible in that a basic "rev-list --objects" would now succeed
> > instead of fail if a tree was missing (I haven't tested this though).
> The presence of the die if !is_promisor_object is what justified the
> changing of the parse_tree_gently to always be gently, since it is
> what showed the OID. Can we really remove both? Maybe in a different
> patch set, since I'm no longer touching that line?

That's true - the idea of removing both needs more thought, and if we
were to do so, we definitely can do it in a different patch set.

> > We might need a flag called "do_not_die_on_missing_tree" (much like your
> > original idea of "show_missing_trees") so that callers that are prepared
> > to deal with missing trees can set this. Sorry for the churn. You can
> > document it as such:
> Added it, but not with a command-line flag, only in rev-info.h. We can
> always  add a flag later if people have been relying on the existing
> behavior of git rev-list to balk at missing trees. (That seems
> unlikely though, considering there is no filter to enable that before
> this patchset).

By flag, I indeed meant in rev-info.h - sorry for the confusion. That
sounds good.


Re: [PATCH v2 3/5] rev-list: handle missing tree objects properly

2018-08-13 Thread Matthew DeVore
Resending this in plain-text mode so that git@vger.kernel.org won't
bounce it. Sorry for those of you receiving this twice.

On Mon, Aug 13, 2018 at 11:20 AM Jonathan Tan  wrote:
>
> > In list-objects.c we no longer print a message to stderr if a tree
> > object is missing (quiet_on_missing is always true). I couldn't find
> > any place where this would matter, or where the caller of
> > traverse_commit_list would need to be fixed to show the error. However,
> > in the future it would be trivial to make the caller show the message if
> > we needed to.
>
> Indeed, and I'm not sure why the message was there in the first place -
> if parsing fails when revs->ignore_missing_links and
> revs->exclude_promisor_objects are both false, we print the OID anyway
> in the "die" call, so any message printed by parse_tree_gently() seems
> superfluous.
>
> It might be better to add an additional commit that removes the "gently"
> condition (in other words, always parsing gently), with a message
> explaining the above. Also, in that commit, I prefer not to add the
> "/*quiet_on_missing*/" explanation (we don't seem to do that in Git
> code); I also know that the ">= 0" is a holdover from the existing "< 0"
> code, but we don't need to do that either.
Good idea. I've added a new commit which replaces the calculation with
a hard-coded "1"
I don't understand about the ">= 0". What should I replace it with?
Maybe you mean the return is never positive so I can change:

parse_tree_gently(tree, 1) >= 0

to:
!parse_tree_gently(tree, 1)

?

>
> > This is not tested very thoroughly, since we cannot create promisor
> > objects in tests without using an actual partial clone. t0410 has a
> > promise_and_delete utility function, but the is_promisor_object function
> > does not return 1 for objects deleted in this way. More tests will will
> > come in a patch that implements a filter that can be used with git
> > clone.
>
> is_promisor_object() should. If you still have the code you used to
> verify that, can you share it? In particular, pay attention to the path
> of the repo - promise_and_delete is hardcoded to use one particular
> path.
It turns out I wasn't setting the extensions.partial_clone config in
my test, and that's why everything wasn't working. So I've moved all
the tests feasible back to the earlier commit. Cool :)

>
> Whether you test in this patch or in the last patch, make sure that the
> following are tested:
>  git rev-list --missing=error, allow-any, allow-promisor, print
>  git rev-list --exclude-promisor-objects
>
Added --missing=print, --missing=allow-any, and
--exclude-promisor-objects to t0410
--missing=allow-promisor did some seem sufficiently interesting or
different from allow-any to justify adding it.
I had to put missing=error into the commit that introduces the tree:0
filter, since that flag causes an automatic attempt to fetch the
missing object, which t0410 does not seem to support. So added test
case "auto-fetching of trees with --missing=error" to t5616.

> Also, test when a tree pointed to by a commit is missing, and when a
> tree pointed to by a tree is missing.
Former is done multiple times already, added latter to t0410 as
"missing non-root tree object and rev-list."
>
> > @@ -152,20 +151,21 @@ static void process_tree(struct traversal_context 
> > *ctx,
> >   die("bad tree object");
> >   if (obj->flags & (UNINTERESTING | SEEN))
> >   return;
> > - if (parse_tree_gently(tree, gently) < 0) {
> > + parsed = parse_tree_gently(tree, /*quiet_on_missing=*/1) >= 0;
> > + if (!parsed) {
> >   if (revs->ignore_missing_links)
> >   return;
> >
> > + if (!is_promisor_object(&obj->oid))
> > + die("bad tree object %s", oid_to_hex(&obj->oid));
> > +
> >   /*
> >* Pre-filter known-missing tree objects when explicitly
> >* requested.  This may cause the actual filter to report
> >* an incomplete list of missing objects.
> >*/
> > - if (revs->exclude_promisor_objects &&
> > - is_promisor_object(&obj->oid))
> > + if (revs->exclude_promisor_objects)
> >   return;
> > -
> > - die("bad tree object %s", oid_to_hex(&obj->oid));
> >   }
>
> The missing mechanism (for error, allow-any, print) should work without
> needing to consult whether an object is a promisor object or not - it
> should just print whatever is missing, so the "if
> (!is_promisor_object..." line looks out of place.
Done. I considered that a missing object which is not a promisor is a
serious error, so I had it die here. But now that I've added the
do_not_die_on_missing_tree flag, it's more natural to keep the
previous promisor check as-is. Also, is_promisor_object is an
expensive check, and it would be better to skip it during the common
execution path (which should be when exclude_promisor_objects, 

Re: [PATCH v2 3/5] rev-list: handle missing tree objects properly

2018-08-13 Thread Jonathan Tan
> In list-objects.c we no longer print a message to stderr if a tree
> object is missing (quiet_on_missing is always true). I couldn't find
> any place where this would matter, or where the caller of
> traverse_commit_list would need to be fixed to show the error. However,
> in the future it would be trivial to make the caller show the message if
> we needed to.

Indeed, and I'm not sure why the message was there in the first place -
if parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the "die" call, so any message printed by parse_tree_gently() seems
superfluous.

It might be better to add an additional commit that removes the "gently"
condition (in other words, always parsing gently), with a message
explaining the above. Also, in that commit, I prefer not to add the
"/*quiet_on_missing*/" explanation (we don't seem to do that in Git
code); I also know that the ">= 0" is a holdover from the existing "< 0"
code, but we don't need to do that either.

> This is not tested very thoroughly, since we cannot create promisor
> objects in tests without using an actual partial clone. t0410 has a
> promise_and_delete utility function, but the is_promisor_object function
> does not return 1 for objects deleted in this way. More tests will will
> come in a patch that implements a filter that can be used with git
> clone.

is_promisor_object() should. If you still have the code you used to
verify that, can you share it? In particular, pay attention to the path
of the repo - promise_and_delete is hardcoded to use one particular
path.

Whether you test in this patch or in the last patch, make sure that the
following are tested:
 git rev-list --missing=error, allow-any, allow-promisor, print
 git rev-list --exclude-promisor-objects

Also, test when a tree pointed to by a commit is missing, and when a
tree pointed to by a tree is missing.

> @@ -152,20 +151,21 @@ static void process_tree(struct traversal_context *ctx,
>   die("bad tree object");
>   if (obj->flags & (UNINTERESTING | SEEN))
>   return;
> - if (parse_tree_gently(tree, gently) < 0) {
> + parsed = parse_tree_gently(tree, /*quiet_on_missing=*/1) >= 0;
> + if (!parsed) {
>   if (revs->ignore_missing_links)
>   return;
>  
> + if (!is_promisor_object(&obj->oid))
> + die("bad tree object %s", oid_to_hex(&obj->oid));
> +
>   /*
>* Pre-filter known-missing tree objects when explicitly
>* requested.  This may cause the actual filter to report
>* an incomplete list of missing objects.
>*/
> - if (revs->exclude_promisor_objects &&
> - is_promisor_object(&obj->oid))
> + if (revs->exclude_promisor_objects)
>   return;
> -
> - die("bad tree object %s", oid_to_hex(&obj->oid));
>   }

The missing mechanism (for error, allow-any, print) should work without
needing to consult whether an object is a promisor object or not - it
should just print whatever is missing, so the "if
(!is_promisor_object..." line looks out of place.

In my original review [1], I suggested that we always show a tree if we
have its hash - if we don't have the object, we just recurse into it.
This would be the same as your patch, except that the 'die("bad tree
object...' is totally removed instead of merely moved. I still think
this solution has some merit - all the tests still pass (except that we
need to check for "unable to read" instead of "bad tree object" in error
messages), but I just realized that it might still be backwards
incompatible in that a basic "rev-list --objects" would now succeed
instead of fail if a tree was missing (I haven't tested this though).

We might need a flag called "do_not_die_on_missing_tree" (much like your
original idea of "show_missing_trees") so that callers that are prepared
to deal with missing trees can set this. Sorry for the churn. You can
document it as such:

 Blobs are shown without regard for their existence. But not so for
 trees: unless exclude_promisor_objects is set and the tree in question
 is a promisor object, or ignore_missing_links is set (and in this case,
 the tree in question may or may not be a promisor object), the revision
 walker dies with a "bad tree object" message when encountering a
 missing tree.

 For callers that can handle missing trees and want them to be
 filterable and showable, set this to true. The revision walker will
 filter and show such a missing tree as usual, but will not attempt to
 recurse into this tree object.

[1] 
https://public-inbox.org/git/20180810002411.13447-1-jonathanta...@google.com/