Re: [PATCH v2 3/5] rev-list: handle missing tree objects properly

2018-08-14 Thread Jonathan Tan
> I don't understand about the ">= 0". What should I replace it with?
> Maybe you mean the return is never positive so I can change:
> 
> parse_tree_gently(tree, 1) >= 0
> 
> to:
> !parse_tree_gently(tree, 1)
> 
> ?

Sorry for the lack of clarity - that is what I meant.

> > The missing mechanism (for error, allow-any, print) should work without
> > needing to consult whether an object is a promisor object or not - it
> > should just print whatever is missing, so the "if
> > (!is_promisor_object..." line looks out of place.
> Done. I considered that a missing object which is not a promisor is a
> serious error, so I had it die here.

It is a serious error, but as far as I can tell, that is what the
--missing flags are supposed to help diagnose (so we can't die since we
need the diagnoses to be printed). See, for example, 'rev-list W/
--missing=print' in t6112 - the "r1" repository does not have partial
clone enabled (which I verified by inserting a test_pause then cat-ting
r1/.git/config), but nothing dies.

> But now that I've added the
> do_not_die_on_missing_tree flag, it's more natural to keep the
> previous promisor check as-is.

OK, I'll take a look once you send out v4.

> Also, is_promisor_object is an
> expensive check, and it would be better to skip it during the common
> execution path (which should be when exclude_promisor_objects, an
> internal-use-only flag, is *not* set, which means we never call
> is_promisor_object.

That's true.

> > In my original review [1], I suggested that we always show a tree if we
> > have its hash - if we don't have the object, we just recurse into it.
> > This would be the same as your patch, except that the 'die("bad tree
> > object...' is totally removed instead of merely moved. I still think
> > this solution has some merit - all the tests still pass (except that we
> > need to check for "unable to read" instead of "bad tree object" in error
> > messages), but I just realized that it might still be backwards
> > incompatible in that a basic "rev-list --objects" would now succeed
> > instead of fail if a tree was missing (I haven't tested this though).
> The presence of the die if !is_promisor_object is what justified the
> changing of the parse_tree_gently to always be gently, since it is
> what showed the OID. Can we really remove both? Maybe in a different
> patch set, since I'm no longer touching that line?

That's true - the idea of removing both needs more thought, and if we
were to do so, we definitely can do it in a different patch set.

> > We might need a flag called "do_not_die_on_missing_tree" (much like your
> > original idea of "show_missing_trees") so that callers that are prepared
> > to deal with missing trees can set this. Sorry for the churn. You can
> > document it as such:
> Added it, but not with a command-line flag, only in rev-info.h. We can
> always  add a flag later if people have been relying on the existing
> behavior of git rev-list to balk at missing trees. (That seems
> unlikely though, considering there is no filter to enable that before
> this patchset).

By flag, I indeed meant in rev-info.h - sorry for the confusion. That
sounds good.


Re: [PATCH v2 3/5] rev-list: handle missing tree objects properly

2018-08-13 Thread Matthew DeVore
Resending this in plain-text mode so that git@vger.kernel.org won't
bounce it. Sorry for those of you receiving this twice.

On Mon, Aug 13, 2018 at 11:20 AM Jonathan Tan  wrote:
>
> > In list-objects.c we no longer print a message to stderr if a tree
> > object is missing (quiet_on_missing is always true). I couldn't find
> > any place where this would matter, or where the caller of
> > traverse_commit_list would need to be fixed to show the error. However,
> > in the future it would be trivial to make the caller show the message if
> > we needed to.
>
> Indeed, and I'm not sure why the message was there in the first place -
> if parsing fails when revs->ignore_missing_links and
> revs->exclude_promisor_objects are both false, we print the OID anyway
> in the "die" call, so any message printed by parse_tree_gently() seems
> superfluous.
>
> It might be better to add an additional commit that removes the "gently"
> condition (in other words, always parsing gently), with a message
> explaining the above. Also, in that commit, I prefer not to add the
> "/*quiet_on_missing*/" explanation (we don't seem to do that in Git
> code); I also know that the ">= 0" is a holdover from the existing "< 0"
> code, but we don't need to do that either.
Good idea. I've added a new commit which replaces the calculation with
a hard-coded "1"
I don't understand about the ">= 0". What should I replace it with?
Maybe you mean the return is never positive so I can change:

parse_tree_gently(tree, 1) >= 0

to:
!parse_tree_gently(tree, 1)

?

>
> > This is not tested very thoroughly, since we cannot create promisor
> > objects in tests without using an actual partial clone. t0410 has a
> > promise_and_delete utility function, but the is_promisor_object function
> > does not return 1 for objects deleted in this way. More tests will will
> > come in a patch that implements a filter that can be used with git
> > clone.
>
> is_promisor_object() should. If you still have the code you used to
> verify that, can you share it? In particular, pay attention to the path
> of the repo - promise_and_delete is hardcoded to use one particular
> path.
It turns out I wasn't setting the extensions.partial_clone config in
my test, and that's why everything wasn't working. So I've moved all
the tests feasible back to the earlier commit. Cool :)

>
> Whether you test in this patch or in the last patch, make sure that the
> following are tested:
>  git rev-list --missing=error, allow-any, allow-promisor, print
>  git rev-list --exclude-promisor-objects
>
Added --missing=print, --missing=allow-any, and
--exclude-promisor-objects to t0410
--missing=allow-promisor did some seem sufficiently interesting or
different from allow-any to justify adding it.
I had to put missing=error into the commit that introduces the tree:0
filter, since that flag causes an automatic attempt to fetch the
missing object, which t0410 does not seem to support. So added test
case "auto-fetching of trees with --missing=error" to t5616.

> Also, test when a tree pointed to by a commit is missing, and when a
> tree pointed to by a tree is missing.
Former is done multiple times already, added latter to t0410 as
"missing non-root tree object and rev-list."
>
> > @@ -152,20 +151,21 @@ static void process_tree(struct traversal_context 
> > *ctx,
> >   die("bad tree object");
> >   if (obj->flags & (UNINTERESTING | SEEN))
> >   return;
> > - if (parse_tree_gently(tree, gently) < 0) {
> > + parsed = parse_tree_gently(tree, /*quiet_on_missing=*/1) >= 0;
> > + if (!parsed) {
> >   if (revs->ignore_missing_links)
> >   return;
> >
> > + if (!is_promisor_object(>oid))
> > + die("bad tree object %s", oid_to_hex(>oid));
> > +
> >   /*
> >* Pre-filter known-missing tree objects when explicitly
> >* requested.  This may cause the actual filter to report
> >* an incomplete list of missing objects.
> >*/
> > - if (revs->exclude_promisor_objects &&
> > - is_promisor_object(>oid))
> > + if (revs->exclude_promisor_objects)
> >   return;
> > -
> > - die("bad tree object %s", oid_to_hex(>oid));
> >   }
>
> The missing mechanism (for error, allow-any, print) should work without
> needing to consult whether an object is a promisor object or not - it
> should just print whatever is missing, so the "if
> (!is_promisor_object..." line looks out of place.
Done. I considered that a missing object which is not a promisor is a
serious error, so I had it die here. But now that I've added the
do_not_die_on_missing_tree flag, it's more natural to keep the
previous promisor check as-is. Also, is_promisor_object is an
expensive check, and it would be better to skip it during the common
execution path (which should be when exclude_promisor_objects, an

Re: [PATCH v2 3/5] rev-list: handle missing tree objects properly

2018-08-13 Thread Jonathan Tan
> In list-objects.c we no longer print a message to stderr if a tree
> object is missing (quiet_on_missing is always true). I couldn't find
> any place where this would matter, or where the caller of
> traverse_commit_list would need to be fixed to show the error. However,
> in the future it would be trivial to make the caller show the message if
> we needed to.

Indeed, and I'm not sure why the message was there in the first place -
if parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the "die" call, so any message printed by parse_tree_gently() seems
superfluous.

It might be better to add an additional commit that removes the "gently"
condition (in other words, always parsing gently), with a message
explaining the above. Also, in that commit, I prefer not to add the
"/*quiet_on_missing*/" explanation (we don't seem to do that in Git
code); I also know that the ">= 0" is a holdover from the existing "< 0"
code, but we don't need to do that either.

> This is not tested very thoroughly, since we cannot create promisor
> objects in tests without using an actual partial clone. t0410 has a
> promise_and_delete utility function, but the is_promisor_object function
> does not return 1 for objects deleted in this way. More tests will will
> come in a patch that implements a filter that can be used with git
> clone.

is_promisor_object() should. If you still have the code you used to
verify that, can you share it? In particular, pay attention to the path
of the repo - promise_and_delete is hardcoded to use one particular
path.

Whether you test in this patch or in the last patch, make sure that the
following are tested:
 git rev-list --missing=error, allow-any, allow-promisor, print
 git rev-list --exclude-promisor-objects

Also, test when a tree pointed to by a commit is missing, and when a
tree pointed to by a tree is missing.

> @@ -152,20 +151,21 @@ static void process_tree(struct traversal_context *ctx,
>   die("bad tree object");
>   if (obj->flags & (UNINTERESTING | SEEN))
>   return;
> - if (parse_tree_gently(tree, gently) < 0) {
> + parsed = parse_tree_gently(tree, /*quiet_on_missing=*/1) >= 0;
> + if (!parsed) {
>   if (revs->ignore_missing_links)
>   return;
>  
> + if (!is_promisor_object(>oid))
> + die("bad tree object %s", oid_to_hex(>oid));
> +
>   /*
>* Pre-filter known-missing tree objects when explicitly
>* requested.  This may cause the actual filter to report
>* an incomplete list of missing objects.
>*/
> - if (revs->exclude_promisor_objects &&
> - is_promisor_object(>oid))
> + if (revs->exclude_promisor_objects)
>   return;
> -
> - die("bad tree object %s", oid_to_hex(>oid));
>   }

The missing mechanism (for error, allow-any, print) should work without
needing to consult whether an object is a promisor object or not - it
should just print whatever is missing, so the "if
(!is_promisor_object..." line looks out of place.

In my original review [1], I suggested that we always show a tree if we
have its hash - if we don't have the object, we just recurse into it.
This would be the same as your patch, except that the 'die("bad tree
object...' is totally removed instead of merely moved. I still think
this solution has some merit - all the tests still pass (except that we
need to check for "unable to read" instead of "bad tree object" in error
messages), but I just realized that it might still be backwards
incompatible in that a basic "rev-list --objects" would now succeed
instead of fail if a tree was missing (I haven't tested this though).

We might need a flag called "do_not_die_on_missing_tree" (much like your
original idea of "show_missing_trees") so that callers that are prepared
to deal with missing trees can set this. Sorry for the churn. You can
document it as such:

 Blobs are shown without regard for their existence. But not so for
 trees: unless exclude_promisor_objects is set and the tree in question
 is a promisor object, or ignore_missing_links is set (and in this case,
 the tree in question may or may not be a promisor object), the revision
 walker dies with a "bad tree object" message when encountering a
 missing tree.

 For callers that can handle missing trees and want them to be
 filterable and showable, set this to true. The revision walker will
 filter and show such a missing tree as usual, but will not attempt to
 recurse into this tree object.

[1] 
https://public-inbox.org/git/20180810002411.13447-1-jonathanta...@google.com/


[PATCH v2 3/5] rev-list: handle missing tree objects properly

2018-08-10 Thread Matthew DeVore
Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. A missing tree
will cause an error if --missing indicates an error should be caused,
and the hash is printed even if the tree is missing.

In list-objects.c we no longer print a message to stderr if a tree
object is missing (quiet_on_missing is always true). I couldn't find
any place where this would matter, or where the caller of
traverse_commit_list would need to be fixed to show the error. However,
in the future it would be trivial to make the caller show the message if
we needed to.

This is not tested very thoroughly, since we cannot create promisor
objects in tests without using an actual partial clone. t0410 has a
promise_and_delete utility function, but the is_promisor_object function
does not return 1 for objects deleted in this way. More tests will will
come in a patch that implements a filter that can be used with git
clone.

Signed-off-by: Matthew DeVore 
---
 builtin/rev-list.c | 10 ++
 list-objects.c | 17 +
 t/t5317-pack-objects-filter-objects.sh | 13 +
 3 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..ea0daf0c4 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 */
switch (arg_missing_action) {
case MA_ERROR:
-   die("missing blob object '%s'", oid_to_hex(>oid));
+   die("missing %s object '%s'",
+   type_name(obj->type), oid_to_hex(>oid));
return;
 
case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
case MA_ALLOW_PROMISOR:
if (is_promisor_object(>oid))
return;
-   die("unexpected missing blob object '%s'",
-   oid_to_hex(>oid));
+   die("unexpected missing %s object '%s'",
+   type_name(obj->type), oid_to_hex(>oid));
return;
 
default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
struct rev_list_info *info = cb_data;
-   if (obj->type == OBJ_BLOB && !has_object_file(>oid)) {
+   if (!has_object_file(>oid)) {
finish_object__ma(obj);
return 1;
}
diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..aedcd0228 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,7 @@ static void process_tree(struct traversal_context *ctx,
struct rev_info *revs = ctx->revs;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-   int gently = revs->ignore_missing_links ||
-revs->exclude_promisor_objects;
+   int parsed;
 
if (!revs->tree_objects)
return;
@@ -152,20 +151,21 @@ static void process_tree(struct traversal_context *ctx,
die("bad tree object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   if (parse_tree_gently(tree, gently) < 0) {
+   parsed = parse_tree_gently(tree, /*quiet_on_missing=*/1) >= 0;
+   if (!parsed) {
if (revs->ignore_missing_links)
return;
 
+   if (!is_promisor_object(>oid))
+   die("bad tree object %s", oid_to_hex(>oid));
+
/*
 * Pre-filter known-missing tree objects when explicitly
 * requested.  This may cause the actual filter to report
 * an incomplete list of missing objects.
 */
-   if (revs->exclude_promisor_objects &&
-   is_promisor_object(>oid))
+   if (revs->exclude_promisor_objects)
return;
-
-   die("bad tree object %s", oid_to_hex(>oid));
}
 
strbuf_addstr(base, name);
@@ -180,7 +180,8 @@ static void process_tree(struct traversal_context *ctx,
if (base->len)
strbuf_addch(base, '/');
 
-   process_tree_contents(ctx, tree, base);
+   if (parsed)
+   process_tree_contents(ctx, tree, base);
 
if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/t/t5317-pack-objects-filter-objects.sh 
b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..5e35f33bf 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++