Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries

2018-10-29 Thread Johannes Schindelin
Hi Jonathan,

On Fri, 26 Oct 2018, Jonathan Tan wrote:

> > So even better would be to use `is_promisor_object(oid) ||
> > has_object_file(oid)`, right?
> > 
> > This is something that is probably not even needed: as I mentioned,
> > the shallow commits are *expected* to be local. It should not ever
> > happen that they are fetched.
> 
> That would help, but I don't think it would help in the "fast-forward
> from A to B where A is B's parent" case I describe in [1].

I don't think that that analysis is correct. It assumes that there could
be a promised commit that is also listed as shallow. I do not think that
is a possible scenario.

And even if it were, why would asking for a promised commit object
download not only that object but also all of its trees and all of its
ancestors? That's not how I understood the idea of the lazy clone: I
understood promised objects to be downloaded on demand, individually.

> My suggestion was:
> 
> > It sounds safer to me to use the fast approach in this patch when the
> > repository is not partial, and stick to the slow approach when it is.
> 
> which can be done by replacing "prune_shallow(0, 1)" in patch 3 with
> "prune_shallow(0, !repository_format_partial_clone)", possibly with a comment
> that the fast method checks object existence for each shallow line directly,
> which is undesirable when the repository is a partial clone.

I am afraid that that would re-introduce the bug pointed out by Peff: you
*really* would need to traverse all reachable commits to use the non-fast
pruning method. And we simply don't.

Ciao,
Dscho

> (repository_format_partial_clone is non-NULL with the name of the promisor
> remote if the repository is a partial clone, and NULL otherwise).
> 
> [1] 
> https://public-inbox.org/git/20181025185459.206127-1-jonathanta...@google.com/
> 


Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries

2018-10-26 Thread Jonathan Tan
> Thanks for confirming.
> 
> So even better would be to use `is_promisor_object(oid) ||
> has_object_file(oid)`, right?
> 
> This is something that is probably not even needed: as I mentioned, the
> shallow commits are *expected* to be local. It should not ever happen that
> they are fetched.

That would help, but I don't think it would help in the "fast-forward
from A to B where A is B's parent" case I describe in [1].

My suggestion was:

> It sounds safer to me to use the fast approach in this patch when the
> repository is not partial, and stick to the slow approach when it is.

which can be done by replacing "prune_shallow(0, 1)" in patch 3 with
"prune_shallow(0, !repository_format_partial_clone)", possibly with a comment
that the fast method checks object existence for each shallow line directly,
which is undesirable when the repository is a partial clone.
(repository_format_partial_clone is non-NULL with the name of the promisor
remote if the repository is a partial clone, and NULL otherwise).

[1] 
https://public-inbox.org/git/20181025185459.206127-1-jonathanta...@google.com/


Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries

2018-10-26 Thread Johannes Schindelin
Hi Jonathan,

On Thu, 25 Oct 2018, Jonathan Tan wrote:

> > On Wed, 24 Oct 2018, Johannes Schindelin wrote:
> > 
> > Coming back to my question whether there is a better way to check for
> > the presence of a local commit, I figured that I can use
> > `has_object_file()` instead of looking up and parsing the commit, as
> > this code does not really need to verify that the shallow entry refers
> > to a commit, but only that it refers to a local object.
> 
> Note that has_object_file() also triggers the lazy fetch if needed, but
> I agree that it's better because you don't really need to parse the
> commit.

Thanks for confirming.

So even better would be to use `is_promisor_object(oid) ||
has_object_file(oid)`, right?

This is something that is probably not even needed: as I mentioned, the
shallow commits are *expected* to be local. It should not ever happen that
they are fetched.

> There is the possibility of just checking for loose objects (which does
> not trigger any lazy fetches), which works for builtin/prune since it
> only prunes loose objects, but doesn't work in the general case, I
> guess.

In the test case I added, the commit object is actually packed. And then,
because it became unreachable, it is dropped.

If I only checked for loose objects here, the shallow line would already
be removed when the commit gets packed, which would be the wrong thing to
do.

In short: thank you for confirming that the current version of the patch
seems to be the best we can do for now.

Ciao,
Dscho


Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries

2018-10-25 Thread Jonathan Tan
> On Wed, 24 Oct 2018, Johannes Schindelin wrote:
> 
> > On Wed, 24 Oct 2018, Junio C Hamano wrote:
> > 
> > > Jonathan, do you see any issues with the use of lookup_commit() in
> > > this change wrt lazy clone?  I am wondering what happens when the
> > > commit in question is at, an immediate parent of, or an immediate
> > > child of a promisor object.  I _think_ this change won't make it
> > > worse for two features in playing together, but thought that it
> > > would be better to double check.
> > 
> > Good point.
> > 
> > Instinctively, I would say that no promised object can be a shallow
> > commit. The entire idea of a shallow commit is that it *is* present, but
> > none of its parents.

I'm envisioning a client repo with a single branch, cloned both with
--depth=1 and with --filter=, that has just fetched to the same
branch also with --depth=1 resulting in a fast-forward from A to B.

If A is B's parent, then A would be known to be promised. (Incidentally,
the problem is greater in current Git, because for performance reasons,
we do not check promisor status when lazily fetching - so it doesn't
matter here whether an object is known to be promised or not.)

When pruning shallow and checking the existence of A, this would trigger
a fetch for A, which would download all commits and trees reachable from
it.

It sounds safer to me to use the fast approach in this patch when the
repository is not partial, and stick to the slow approach when it is.

> > However, I am curious whether there is a better way to check for the
> > presence of a local commit? Do we have an API function for that, that I
> > missed? (I do not really want to parse the commit, after all, just verify
> > that it is not pruned.)
> 
> Okay, I looked around a bit. It seems that there is an
> `is_promisor_object(oid)` function in `pu` to see whether an object was
> promised. If need be (and I am still not convinced that there is a need),
> then we can always add a call to that function to the condition.

I don't think there is a need for is_promisor_object() either - as
described above, it doesn't completely solve the problem.

> Coming back to my question whether there is a better way to check for the
> presence of a local commit, I figured that I can use `has_object_file()`
> instead of looking up and parsing the commit, as this code does not really
> need to verify that the shallow entry refers to a commit, but only that it
> refers to a local object.

Note that has_object_file() also triggers the lazy fetch if needed, but
I agree that it's better because you don't really need to parse the
commit.

There is the possibility of just checking for loose objects (which does
not trigger any lazy fetches), which works for builtin/prune since it
only prunes loose objects, but doesn't work in the general case, I
guess.


Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries

2018-10-24 Thread Johannes Schindelin
Hi,

On Wed, 24 Oct 2018, Johannes Schindelin wrote:

> On Wed, 24 Oct 2018, Junio C Hamano wrote:
> 
> > Jonathan, do you see any issues with the use of lookup_commit() in
> > this change wrt lazy clone?  I am wondering what happens when the
> > commit in question is at, an immediate parent of, or an immediate
> > child of a promisor object.  I _think_ this change won't make it
> > worse for two features in playing together, but thought that it
> > would be better to double check.
> 
> Good point.
> 
> Instinctively, I would say that no promised object can be a shallow
> commit. The entire idea of a shallow commit is that it *is* present, but
> none of its parents.
> 
> Also, I would claim that the shallow feature does not make sense with lazy
> clones, as lazy clones offer a superset of shallow clone's functionality.
> 
> However, I am curious whether there is a better way to check for the
> presence of a local commit? Do we have an API function for that, that I
> missed? (I do not really want to parse the commit, after all, just verify
> that it is not pruned.)

Okay, I looked around a bit. It seems that there is an
`is_promisor_object(oid)` function in `pu` to see whether an object was
promised. If need be (and I am still not convinced that there is a need),
then we can always add a call to that function to the condition.

Coming back to my question whether there is a better way to check for the
presence of a local commit, I figured that I can use `has_object_file()`
instead of looking up and parsing the commit, as this code does not really
need to verify that the shallow entry refers to a commit, but only that it
refers to a local object.

So I'll send another iteration shortly, with this diff applied to 2/3:

-- snip --
diff --git a/shallow.c b/shallow.c
index 9c3faa8af243..02fdbfc554c4 100644
--- a/shallow.c
+++ b/shallow.c
@@ -263,8 +263,7 @@ static int write_one_shallow(const struct commit_graft 
*graft, void *cb_data)
if (graft->nr_parent != -1)
return 0;
if (data->flags & QUICK) {
-   struct commit *c = lookup_commit(the_repository, >oid);
-   if (!c || parse_commit(c))
+   if (!has_object_file(>oid))
return 0;
} else if (data->flags & SEEN_ONLY) {
struct commit *c = lookup_commit(the_repository, >oid);
--snap --

Ciao,
Dscho

> 
> > 
> > "Johannes Schindelin via GitGitGadget" 
> > writes:
> > 
> > > From: Johannes Schindelin 
> > >
> > > The `prune_shallow()` function wants a full reachability check to be
> > > completed before it goes to work, to ensure that all unreachable entries
> > > are removed from the shallow file.
> > >
> > > However, in the upcoming patch we do not even want to go that far. We
> > > really only need to remove entries corresponding to pruned commits, i.e.
> > > to commits that no longer exist.
> > >
> > > Let's support that use case.
> > >
> > > Signed-off-by: Johannes Schindelin 
> > > ---
> > >  builtin/prune.c |  2 +-
> > >  commit.h|  2 +-
> > >  shallow.c   | 22 +-
> > >  3 files changed, 19 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/builtin/prune.c b/builtin/prune.c
> > > index 41230f821..6d6ab6cf1 100644
> > > --- a/builtin/prune.c
> > > +++ b/builtin/prune.c
> > > @@ -161,7 +161,7 @@ int cmd_prune(int argc, const char **argv, const char 
> > > *prefix)
> > >   free(s);
> > >  
> > >   if (is_repository_shallow(the_repository))
> > > - prune_shallow(show_only);
> > > + prune_shallow(show_only, 0);
> > >  
> > >   return 0;
> > >  }
> > > diff --git a/commit.h b/commit.h
> > > index 1d260d62f..ff34447ab 100644
> > > --- a/commit.h
> > > +++ b/commit.h
> > > @@ -249,7 +249,7 @@ extern void assign_shallow_commits_to_refs(struct 
> > > shallow_info *info,
> > >  uint32_t **used,
> > >  int *ref_status);
> > >  extern int delayed_reachability_test(struct shallow_info *si, int c);
> > > -extern void prune_shallow(int show_only);
> > > +extern void prune_shallow(int show_only, int quick_prune);
> > >  extern struct trace_key trace_shallow;
> > >  
> > >  extern int interactive_add(int argc, const char **argv, const char 
> > > *prefix, int patch);
> > > diff --git a/shallow.c b/shallow.c
> > > index 732e18d54..0a2671bc2 100644
> > > --- a/shallow.c
> > > +++ b/shallow.c
> > > @@ -247,6 +247,7 @@ static void check_shallow_file_for_update(struct 
> > > repository *r)
> > >  
> > >  #define SEEN_ONLY 1
> > >  #define VERBOSE   2
> > > +#define QUICK_PRUNE 4
> > >  
> > >  struct write_shallow_data {
> > >   struct strbuf *out;
> > > @@ -261,7 +262,11 @@ static int write_one_shallow(const struct 
> > > commit_graft *graft, void *cb_data)
> > >   const char *hex = oid_to_hex(>oid);
> > >   if (graft->nr_parent != -1)
> > >   return 0;
> > > - if (data->flags & SEEN_ONLY) {
> > > + if (data->flags & QUICK_PRUNE) {
> > > +  

Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries

2018-10-24 Thread Johannes Schindelin
Hi Junio & Jonathan,

On Wed, 24 Oct 2018, Junio C Hamano wrote:

> Jonathan, do you see any issues with the use of lookup_commit() in
> this change wrt lazy clone?  I am wondering what happens when the
> commit in question is at, an immediate parent of, or an immediate
> child of a promisor object.  I _think_ this change won't make it
> worse for two features in playing together, but thought that it
> would be better to double check.

Good point.

Instinctively, I would say that no promised object can be a shallow
commit. The entire idea of a shallow commit is that it *is* present, but
none of its parents.

Also, I would claim that the shallow feature does not make sense with lazy
clones, as lazy clones offer a superset of shallow clone's functionality.

However, I am curious whether there is a better way to check for the
presence of a local commit? Do we have an API function for that, that I
missed? (I do not really want to parse the commit, after all, just verify
that it is not pruned.)

Thanks,
Dscho

> 
> "Johannes Schindelin via GitGitGadget" 
> writes:
> 
> > From: Johannes Schindelin 
> >
> > The `prune_shallow()` function wants a full reachability check to be
> > completed before it goes to work, to ensure that all unreachable entries
> > are removed from the shallow file.
> >
> > However, in the upcoming patch we do not even want to go that far. We
> > really only need to remove entries corresponding to pruned commits, i.e.
> > to commits that no longer exist.
> >
> > Let's support that use case.
> >
> > Signed-off-by: Johannes Schindelin 
> > ---
> >  builtin/prune.c |  2 +-
> >  commit.h|  2 +-
> >  shallow.c   | 22 +-
> >  3 files changed, 19 insertions(+), 7 deletions(-)
> >
> > diff --git a/builtin/prune.c b/builtin/prune.c
> > index 41230f821..6d6ab6cf1 100644
> > --- a/builtin/prune.c
> > +++ b/builtin/prune.c
> > @@ -161,7 +161,7 @@ int cmd_prune(int argc, const char **argv, const char 
> > *prefix)
> > free(s);
> >  
> > if (is_repository_shallow(the_repository))
> > -   prune_shallow(show_only);
> > +   prune_shallow(show_only, 0);
> >  
> > return 0;
> >  }
> > diff --git a/commit.h b/commit.h
> > index 1d260d62f..ff34447ab 100644
> > --- a/commit.h
> > +++ b/commit.h
> > @@ -249,7 +249,7 @@ extern void assign_shallow_commits_to_refs(struct 
> > shallow_info *info,
> >uint32_t **used,
> >int *ref_status);
> >  extern int delayed_reachability_test(struct shallow_info *si, int c);
> > -extern void prune_shallow(int show_only);
> > +extern void prune_shallow(int show_only, int quick_prune);
> >  extern struct trace_key trace_shallow;
> >  
> >  extern int interactive_add(int argc, const char **argv, const char 
> > *prefix, int patch);
> > diff --git a/shallow.c b/shallow.c
> > index 732e18d54..0a2671bc2 100644
> > --- a/shallow.c
> > +++ b/shallow.c
> > @@ -247,6 +247,7 @@ static void check_shallow_file_for_update(struct 
> > repository *r)
> >  
> >  #define SEEN_ONLY 1
> >  #define VERBOSE   2
> > +#define QUICK_PRUNE 4
> >  
> >  struct write_shallow_data {
> > struct strbuf *out;
> > @@ -261,7 +262,11 @@ static int write_one_shallow(const struct commit_graft 
> > *graft, void *cb_data)
> > const char *hex = oid_to_hex(>oid);
> > if (graft->nr_parent != -1)
> > return 0;
> > -   if (data->flags & SEEN_ONLY) {
> > +   if (data->flags & QUICK_PRUNE) {
> > +   struct commit *c = lookup_commit(the_repository, >oid);
> > +   if (!c || parse_commit(c))
> > +   return 0;
> > +   } else if (data->flags & SEEN_ONLY) {
> > struct commit *c = lookup_commit(the_repository, >oid);
> > if (!c || !(c->object.flags & SEEN)) {
> > if (data->flags & VERBOSE)
> > @@ -371,16 +376,23 @@ void advertise_shallow_grafts(int fd)
> >  
> >  /*
> >   * mark_reachable_objects() should have been run prior to this and all
> > - * reachable commits marked as "SEEN".
> > + * reachable commits marked as "SEEN", except when quick_prune is non-zero,
> > + * in which case lines are excised from the shallow file if they refer to
> > + * commits that do not exist (any longer).
> >   */
> > -void prune_shallow(int show_only)
> > +void prune_shallow(int show_only, int quick_prune)
> >  {
> > struct lock_file shallow_lock = LOCK_INIT;
> > struct strbuf sb = STRBUF_INIT;
> > +   unsigned flags = SEEN_ONLY;
> > int fd;
> >  
> > +   if (quick_prune)
> > +   flags |= QUICK_PRUNE;
> > +
> > if (show_only) {
> > -   write_shallow_commits_1(, 0, NULL, SEEN_ONLY | VERBOSE);
> > +   flags |= VERBOSE;
> > +   write_shallow_commits_1(, 0, NULL, flags);
> > strbuf_release();
> > return;
> > }
> > @@ -388,7 +400,7 @@ void prune_shallow(int show_only)
> >git_path_shallow(the_repository),
> >  

Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries

2018-10-23 Thread Junio C Hamano
Jonathan, do you see any issues with the use of lookup_commit() in
this change wrt lazy clone?  I am wondering what happens when the
commit in question is at, an immediate parent of, or an immediate
child of a promisor object.  I _think_ this change won't make it
worse for two features in playing together, but thought that it
would be better to double check.

"Johannes Schindelin via GitGitGadget" 
writes:

> From: Johannes Schindelin 
>
> The `prune_shallow()` function wants a full reachability check to be
> completed before it goes to work, to ensure that all unreachable entries
> are removed from the shallow file.
>
> However, in the upcoming patch we do not even want to go that far. We
> really only need to remove entries corresponding to pruned commits, i.e.
> to commits that no longer exist.
>
> Let's support that use case.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  builtin/prune.c |  2 +-
>  commit.h|  2 +-
>  shallow.c   | 22 +-
>  3 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/prune.c b/builtin/prune.c
> index 41230f821..6d6ab6cf1 100644
> --- a/builtin/prune.c
> +++ b/builtin/prune.c
> @@ -161,7 +161,7 @@ int cmd_prune(int argc, const char **argv, const char 
> *prefix)
>   free(s);
>  
>   if (is_repository_shallow(the_repository))
> - prune_shallow(show_only);
> + prune_shallow(show_only, 0);
>  
>   return 0;
>  }
> diff --git a/commit.h b/commit.h
> index 1d260d62f..ff34447ab 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -249,7 +249,7 @@ extern void assign_shallow_commits_to_refs(struct 
> shallow_info *info,
>  uint32_t **used,
>  int *ref_status);
>  extern int delayed_reachability_test(struct shallow_info *si, int c);
> -extern void prune_shallow(int show_only);
> +extern void prune_shallow(int show_only, int quick_prune);
>  extern struct trace_key trace_shallow;
>  
>  extern int interactive_add(int argc, const char **argv, const char *prefix, 
> int patch);
> diff --git a/shallow.c b/shallow.c
> index 732e18d54..0a2671bc2 100644
> --- a/shallow.c
> +++ b/shallow.c
> @@ -247,6 +247,7 @@ static void check_shallow_file_for_update(struct 
> repository *r)
>  
>  #define SEEN_ONLY 1
>  #define VERBOSE   2
> +#define QUICK_PRUNE 4
>  
>  struct write_shallow_data {
>   struct strbuf *out;
> @@ -261,7 +262,11 @@ static int write_one_shallow(const struct commit_graft 
> *graft, void *cb_data)
>   const char *hex = oid_to_hex(>oid);
>   if (graft->nr_parent != -1)
>   return 0;
> - if (data->flags & SEEN_ONLY) {
> + if (data->flags & QUICK_PRUNE) {
> + struct commit *c = lookup_commit(the_repository, >oid);
> + if (!c || parse_commit(c))
> + return 0;
> + } else if (data->flags & SEEN_ONLY) {
>   struct commit *c = lookup_commit(the_repository, >oid);
>   if (!c || !(c->object.flags & SEEN)) {
>   if (data->flags & VERBOSE)
> @@ -371,16 +376,23 @@ void advertise_shallow_grafts(int fd)
>  
>  /*
>   * mark_reachable_objects() should have been run prior to this and all
> - * reachable commits marked as "SEEN".
> + * reachable commits marked as "SEEN", except when quick_prune is non-zero,
> + * in which case lines are excised from the shallow file if they refer to
> + * commits that do not exist (any longer).
>   */
> -void prune_shallow(int show_only)
> +void prune_shallow(int show_only, int quick_prune)
>  {
>   struct lock_file shallow_lock = LOCK_INIT;
>   struct strbuf sb = STRBUF_INIT;
> + unsigned flags = SEEN_ONLY;
>   int fd;
>  
> + if (quick_prune)
> + flags |= QUICK_PRUNE;
> +
>   if (show_only) {
> - write_shallow_commits_1(, 0, NULL, SEEN_ONLY | VERBOSE);
> + flags |= VERBOSE;
> + write_shallow_commits_1(, 0, NULL, flags);
>   strbuf_release();
>   return;
>   }
> @@ -388,7 +400,7 @@ void prune_shallow(int show_only)
>  git_path_shallow(the_repository),
>  LOCK_DIE_ON_ERROR);
>   check_shallow_file_for_update(the_repository);
> - if (write_shallow_commits_1(, 0, NULL, SEEN_ONLY)) {
> + if (write_shallow_commits_1(, 0, NULL, flags)) {
>   if (write_in_full(fd, sb.buf, sb.len) < 0)
>   die_errno("failed to write to %s",
> get_lock_file_path(_lock));