Re: [PATCH v2 14/25] shallow.c: implement a generic shallow boundary finder based on rev-list

2016-02-19 Thread Duy Nguyen
On Mon, Feb 08, 2016 at 01:09:24PM -0800, Junio C Hamano wrote:
> Nguyễn Thái Ngọc Duy   writes:
> 
> > Instead of a custom commit walker like get_shallow_commits(), this new
> > function uses rev-list to mark NOT_SHALLOW to all reachable commits,
> > except borders. The definition of reachable is to be defined by the
> > protocol later. This makes it more flexible to define shallow boundary.
> >
> > Note: if a commit has one NOT_SHALLOW parent and one SHALLOW parent,
> > then it's considered the boundary. Which means in the client side, this
> > commit has _no_ parents. This could lead to surprising cuts if we're not
> > careful.
> >
> > Another option is to include more commits and only mark commits whose
> > all parents are SHALLOW as boundary.
> 
> The second and third are greek to me at this point ;-) but hopefully
> they will become clear as we read on.

Yeah. Everything looks clearer with illustration. This should be a
better. The question is should we do something about it now, or leave
it as is.

I'm tempted to go with "the first way" in future (so add some comments
about this in is_repository_shallow, instead of leaving it as commit
message in this patch).

-- 8< --
The way we find find border is paint all reachable commits NOT_SHALLOW.
Any of them that "touches" commits without NOT_SHALLOW flag are
considered shallow (e.g. zero parents via grafting mechanism). Shallow
commits and their true parents are all marked SHALLOW. Then NOT_SHALLOW
is removed from shallow commits at the end.

There is interesting observation, though somewhat off topic for this
patch. In the following graph, "x" is unreachable commits. "b" is the
parent of "a".

   x -- a -- o
   //
 x -- b -- o

And as a result, "a" and "b" are both considered shallow commits. After
grafting occurs at the client side, what we see is

a -- o
/
  b -- o

Notice that because of grafting, "a" has zero parents, so "b" is no
longer a parent of "a".

This is unfortunate and may be solved in two ways. The first is change
the way shallow grafting works and keeps "b -- a" connection if "b"
exists and is a shallow commit.

The second way is produce this graph (at client side) instead

   x -- a -- o
   //
  b -- o

Which means we mark "x" as a shallow commit instead of "a".
-- 8< --
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 14/25] shallow.c: implement a generic shallow boundary finder based on rev-list

2016-02-15 Thread Duy Nguyen
On Tue, Feb 9, 2016 at 4:09 AM, Junio C Hamano  wrote:
>> + is_repository_shallow(); /* make sure shallows are read */
>> +
>> + init_revisions(&revs, NULL);
>> + save_commit_buffer = 0;
>> + setup_revisions(ac, av, &revs, NULL);
>> +
>> + /* Mark all reachable commits as NOT_SHALLOW */
>> + if (prepare_revision_walk(&revs))
>> + die("revision walk setup failed");
>> + traverse_commit_list(&revs, show_commit, NULL, ¬_shallow_flag);
>> +
>> + /*
>> +  * mark border commits SHALLOW + NOT_SHALLOW.
>> +  * We cannot clear NOT_SHALLOW right now. Imagine border
>> +  * commit A is processed first, then commit B, whose parent is
>> +  * A, later. If NOT_SHALLOW on A is cleared at step 1, B
>> +  * itself is considered border at step 2, which is incorrect.
>> +  */
>> + nr = get_max_object_index();
>> + for (i = 0; i < nr; i++) {
>
> I'd really like not to see a loop over 0..get_max_object_index().
> Are there many codepaths that peek into the in-core entire object
> store already?

You started it with check_non_tip(). At least that's how I know about
this loop. But I think that's the only code path, not counting this.

> Would it work equally well to keep track of the
> commits discovered in show_commit() to use as the set of commits
> you need to visit in this second pass?

We can't do this in show_commit. In this loop, we check
not_shallow_flag of parent commits. If one parent commit is not
show_commit'd yet, the flag is not set and we may incorrectly think
this is a border commit. The only way to avoid going through the
entire in-core object database is keeping a new commit_list and go
through it here. Which way is preferred?

>> + struct object *o = get_indexed_object(i);
>> + struct commit *c = (struct commit *)o;
>> +
>> + if (!o || o->type != OBJ_COMMIT ||
>> + !(o->flags & not_shallow_flag))
>> + continue;
>> +
>> + if (parse_commit(c))
>> + die("unable to parse commit %s",
>> + oid_to_hex(&c->object.oid));
>> +
>> + for (p = c->parents; p; p = p->next)
>> + if (!(p->item->object.flags & not_shallow_flag)) {
>> + o->flags |= shallow_flag;
>> + commit_list_insert(c, &result);
>> + break;
>> + }
>> + }
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 14/25] shallow.c: implement a generic shallow boundary finder based on rev-list

2016-02-08 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> Instead of a custom commit walker like get_shallow_commits(), this new
> function uses rev-list to mark NOT_SHALLOW to all reachable commits,
> except borders. The definition of reachable is to be defined by the
> protocol later. This makes it more flexible to define shallow boundary.
>
> Note: if a commit has one NOT_SHALLOW parent and one SHALLOW parent,
> then it's considered the boundary. Which means in the client side, this
> commit has _no_ parents. This could lead to surprising cuts if we're not
> careful.
>
> Another option is to include more commits and only mark commits whose
> all parents are SHALLOW as boundary.

The second and third are greek to me at this point ;-) but hopefully
they will become clear as we read on.

> +/*
> + * Given rev-list arguments, run rev-list. All reachable commits
> + * except border ones are marked with not_shallow_flag. Border commits
> + * are marked with shallow_flag. The list of border/shallow commits
> + * are also returned.
> + */
> +struct commit_list *get_shallow_commits_by_rev_list(int ac, const char **av,
> + int shallow_flag,
> + int not_shallow_flag)
> +{
> + struct commit_list *result = NULL, *p;
> + struct rev_info revs;
> + unsigned int i, nr;
> +
> + /*
> +  * SHALLOW (excluded) and NOT_SHALLOW (included) should not be
> +  * set at this point. But better be safe than sorry.
> +  */
> + nr = get_max_object_index();
> + for (i = 0; i < nr; i++) {
> + struct object *o = get_indexed_object(i);
> + if (!o || o->type != OBJ_COMMIT)
> + continue;
> + o->flags &= ~(shallow_flag | not_shallow_flag);
> + }

This is slightly different from clear_object_flags(), but I cannot
tell if it is intended, or if you forgot that the function exists.

> + is_repository_shallow(); /* make sure shallows are read */
> +
> + init_revisions(&revs, NULL);
> + save_commit_buffer = 0;
> + setup_revisions(ac, av, &revs, NULL);
> +
> + /* Mark all reachable commits as NOT_SHALLOW */
> + if (prepare_revision_walk(&revs))
> + die("revision walk setup failed");
> + traverse_commit_list(&revs, show_commit, NULL, ¬_shallow_flag);
> +
> + /*
> +  * mark border commits SHALLOW + NOT_SHALLOW.
> +  * We cannot clear NOT_SHALLOW right now. Imagine border
> +  * commit A is processed first, then commit B, whose parent is
> +  * A, later. If NOT_SHALLOW on A is cleared at step 1, B
> +  * itself is considered border at step 2, which is incorrect.
> +  */
> + nr = get_max_object_index();
> + for (i = 0; i < nr; i++) {

I'd really like not to see a loop over 0..get_max_object_index().
Are there many codepaths that peek into the in-core entire object
store already?  Would it work equally well to keep track of the
commits discovered in show_commit() to use as the set of commits
you need to visit in this second pass?

> + struct object *o = get_indexed_object(i);
> + struct commit *c = (struct commit *)o;
> +
> + if (!o || o->type != OBJ_COMMIT ||
> + !(o->flags & not_shallow_flag))
> + continue;
> +
> + if (parse_commit(c))
> + die("unable to parse commit %s",
> + oid_to_hex(&c->object.oid));
> +
> + for (p = c->parents; p; p = p->next)
> + if (!(p->item->object.flags & not_shallow_flag)) {
> + o->flags |= shallow_flag;
> + commit_list_insert(c, &result);
> + break;
> + }
> + }
> +
> + /*
> +  * Now we can clean up NOT_SHALLOW on border commits. Having
> +  * both flags set can confuse the caller.
> +  */
> + for (p = result; p; p = p->next) {
> + struct object *ro = &p->item->object;

Why "ro" only in this third pass, unlike the other two passes that
said "o" which is in a sense more descriptive?

> + if ((ro->flags & not_shallow_flag) &&
> + (ro->flags & shallow_flag))

If you introduce a "both_flags = shallow_flag | not_shallow_flag"
at the very beginning, this will become

if (o->flags & both_flags)
o->flags &= ~not_shallow_flag;

which would probably be easier to read.  You can pass the same to
clear_object_flags() at the first pass.

> + ro->flags &= ~not_shallow_flag;
> + }
> + return result;
> +}

Other than that, this step looks quite straight-forward to me.

Thanks.

> +
>  static void check_shallow_file_for_update(void)
>  {
>   if (is_shallow == -1)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.or

[PATCH v2 14/25] shallow.c: implement a generic shallow boundary finder based on rev-list

2016-02-04 Thread Nguyễn Thái Ngọc Duy
Instead of a custom commit walker like get_shallow_commits(), this new
function uses rev-list to mark NOT_SHALLOW to all reachable commits,
except borders. The definition of reachable is to be defined by the
protocol later. This makes it more flexible to define shallow boundary.

Note: if a commit has one NOT_SHALLOW parent and one SHALLOW parent,
then it's considered the boundary. Which means in the client side, this
commit has _no_ parents. This could lead to surprising cuts if we're not
careful.

Another option is to include more commits and only mark commits whose
all parents are SHALLOW as boundary.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 commit.h  |  2 ++
 shallow.c | 85 +++
 2 files changed, 87 insertions(+)

diff --git a/commit.h b/commit.h
index 5d58be0..b717be1 100644
--- a/commit.h
+++ b/commit.h
@@ -258,6 +258,8 @@ extern int for_each_commit_graft(each_commit_graft_fn, void 
*);
 extern int is_repository_shallow(void);
 extern struct commit_list *get_shallow_commits(struct object_array *heads,
int depth, int shallow_flag, int not_shallow_flag);
+extern struct commit_list *get_shallow_commits_by_rev_list(
+   int ac, const char **av, int shallow_flag, int 
not_shallow_flag);
 extern void set_alternate_shallow_file(const char *path, int override);
 extern int write_shallow_commits(struct strbuf *out, int use_pack_protocol,
 const struct sha1_array *extra);
diff --git a/shallow.c b/shallow.c
index 60f1505..f5d5c1d 100644
--- a/shallow.c
+++ b/shallow.c
@@ -10,6 +10,8 @@
 #include "diff.h"
 #include "revision.h"
 #include "commit-slab.h"
+#include "revision.h"
+#include "list-objects.h"
 
 static int is_shallow = -1;
 static struct stat_validity shallow_stat;
@@ -137,6 +139,89 @@ struct commit_list *get_shallow_commits(struct 
object_array *heads, int depth,
return result;
 }
 
+static void show_commit(struct commit *commit, void *data)
+{
+   commit->object.flags |= *(int *)data;
+}
+
+/*
+ * Given rev-list arguments, run rev-list. All reachable commits
+ * except border ones are marked with not_shallow_flag. Border commits
+ * are marked with shallow_flag. The list of border/shallow commits
+ * are also returned.
+ */
+struct commit_list *get_shallow_commits_by_rev_list(int ac, const char **av,
+   int shallow_flag,
+   int not_shallow_flag)
+{
+   struct commit_list *result = NULL, *p;
+   struct rev_info revs;
+   unsigned int i, nr;
+
+   /*
+* SHALLOW (excluded) and NOT_SHALLOW (included) should not be
+* set at this point. But better be safe than sorry.
+*/
+   nr = get_max_object_index();
+   for (i = 0; i < nr; i++) {
+   struct object *o = get_indexed_object(i);
+   if (!o || o->type != OBJ_COMMIT)
+   continue;
+   o->flags &= ~(shallow_flag | not_shallow_flag);
+   }
+
+   is_repository_shallow(); /* make sure shallows are read */
+
+   init_revisions(&revs, NULL);
+   save_commit_buffer = 0;
+   setup_revisions(ac, av, &revs, NULL);
+
+   /* Mark all reachable commits as NOT_SHALLOW */
+   if (prepare_revision_walk(&revs))
+   die("revision walk setup failed");
+   traverse_commit_list(&revs, show_commit, NULL, ¬_shallow_flag);
+
+   /*
+* mark border commits SHALLOW + NOT_SHALLOW.
+* We cannot clear NOT_SHALLOW right now. Imagine border
+* commit A is processed first, then commit B, whose parent is
+* A, later. If NOT_SHALLOW on A is cleared at step 1, B
+* itself is considered border at step 2, which is incorrect.
+*/
+   nr = get_max_object_index();
+   for (i = 0; i < nr; i++) {
+   struct object *o = get_indexed_object(i);
+   struct commit *c = (struct commit *)o;
+
+   if (!o || o->type != OBJ_COMMIT ||
+   !(o->flags & not_shallow_flag))
+   continue;
+
+   if (parse_commit(c))
+   die("unable to parse commit %s",
+   oid_to_hex(&c->object.oid));
+
+   for (p = c->parents; p; p = p->next)
+   if (!(p->item->object.flags & not_shallow_flag)) {
+   o->flags |= shallow_flag;
+   commit_list_insert(c, &result);
+   break;
+   }
+   }
+
+   /*
+* Now we can clean up NOT_SHALLOW on border commits. Having
+* both flags set can confuse the caller.
+*/
+   for (p = result; p; p = p->next) {
+   struct object *ro = &p->item->object;
+   if ((ro->flags & not_shallow_flag) &&
+   (ro->flags & shallow_flag))
+