Re: [PATCH v2 14/25] shallow.c: implement a generic shallow boundary finder based on rev-list
On Mon, Feb 08, 2016 at 01:09:24PM -0800, Junio C Hamano wrote: > Nguyễn Thái Ngọc Duy writes: > > > Instead of a custom commit walker like get_shallow_commits(), this new > > function uses rev-list to mark NOT_SHALLOW to all reachable commits, > > except borders. The definition of reachable is to be defined by the > > protocol later. This makes it more flexible to define shallow boundary. > > > > Note: if a commit has one NOT_SHALLOW parent and one SHALLOW parent, > > then it's considered the boundary. Which means in the client side, this > > commit has _no_ parents. This could lead to surprising cuts if we're not > > careful. > > > > Another option is to include more commits and only mark commits whose > > all parents are SHALLOW as boundary. > > The second and third are greek to me at this point ;-) but hopefully > they will become clear as we read on. Yeah. Everything looks clearer with illustration. This should be a better. The question is should we do something about it now, or leave it as is. I'm tempted to go with "the first way" in future (so add some comments about this in is_repository_shallow, instead of leaving it as commit message in this patch). -- 8< -- The way we find find border is paint all reachable commits NOT_SHALLOW. Any of them that "touches" commits without NOT_SHALLOW flag are considered shallow (e.g. zero parents via grafting mechanism). Shallow commits and their true parents are all marked SHALLOW. Then NOT_SHALLOW is removed from shallow commits at the end. There is interesting observation, though somewhat off topic for this patch. In the following graph, "x" is unreachable commits. "b" is the parent of "a". x -- a -- o // x -- b -- o And as a result, "a" and "b" are both considered shallow commits. After grafting occurs at the client side, what we see is a -- o / b -- o Notice that because of grafting, "a" has zero parents, so "b" is no longer a parent of "a". This is unfortunate and may be solved in two ways. The first is change the way shallow grafting works and keeps "b -- a" connection if "b" exists and is a shallow commit. The second way is produce this graph (at client side) instead x -- a -- o // b -- o Which means we mark "x" as a shallow commit instead of "a". -- 8< -- -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 14/25] shallow.c: implement a generic shallow boundary finder based on rev-list
On Tue, Feb 9, 2016 at 4:09 AM, Junio C Hamano wrote: >> + is_repository_shallow(); /* make sure shallows are read */ >> + >> + init_revisions(&revs, NULL); >> + save_commit_buffer = 0; >> + setup_revisions(ac, av, &revs, NULL); >> + >> + /* Mark all reachable commits as NOT_SHALLOW */ >> + if (prepare_revision_walk(&revs)) >> + die("revision walk setup failed"); >> + traverse_commit_list(&revs, show_commit, NULL, ¬_shallow_flag); >> + >> + /* >> + * mark border commits SHALLOW + NOT_SHALLOW. >> + * We cannot clear NOT_SHALLOW right now. Imagine border >> + * commit A is processed first, then commit B, whose parent is >> + * A, later. If NOT_SHALLOW on A is cleared at step 1, B >> + * itself is considered border at step 2, which is incorrect. >> + */ >> + nr = get_max_object_index(); >> + for (i = 0; i < nr; i++) { > > I'd really like not to see a loop over 0..get_max_object_index(). > Are there many codepaths that peek into the in-core entire object > store already? You started it with check_non_tip(). At least that's how I know about this loop. But I think that's the only code path, not counting this. > Would it work equally well to keep track of the > commits discovered in show_commit() to use as the set of commits > you need to visit in this second pass? We can't do this in show_commit. In this loop, we check not_shallow_flag of parent commits. If one parent commit is not show_commit'd yet, the flag is not set and we may incorrectly think this is a border commit. The only way to avoid going through the entire in-core object database is keeping a new commit_list and go through it here. Which way is preferred? >> + struct object *o = get_indexed_object(i); >> + struct commit *c = (struct commit *)o; >> + >> + if (!o || o->type != OBJ_COMMIT || >> + !(o->flags & not_shallow_flag)) >> + continue; >> + >> + if (parse_commit(c)) >> + die("unable to parse commit %s", >> + oid_to_hex(&c->object.oid)); >> + >> + for (p = c->parents; p; p = p->next) >> + if (!(p->item->object.flags & not_shallow_flag)) { >> + o->flags |= shallow_flag; >> + commit_list_insert(c, &result); >> + break; >> + } >> + } -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 14/25] shallow.c: implement a generic shallow boundary finder based on rev-list
Nguyễn Thái Ngọc Duy writes: > Instead of a custom commit walker like get_shallow_commits(), this new > function uses rev-list to mark NOT_SHALLOW to all reachable commits, > except borders. The definition of reachable is to be defined by the > protocol later. This makes it more flexible to define shallow boundary. > > Note: if a commit has one NOT_SHALLOW parent and one SHALLOW parent, > then it's considered the boundary. Which means in the client side, this > commit has _no_ parents. This could lead to surprising cuts if we're not > careful. > > Another option is to include more commits and only mark commits whose > all parents are SHALLOW as boundary. The second and third are greek to me at this point ;-) but hopefully they will become clear as we read on. > +/* > + * Given rev-list arguments, run rev-list. All reachable commits > + * except border ones are marked with not_shallow_flag. Border commits > + * are marked with shallow_flag. The list of border/shallow commits > + * are also returned. > + */ > +struct commit_list *get_shallow_commits_by_rev_list(int ac, const char **av, > + int shallow_flag, > + int not_shallow_flag) > +{ > + struct commit_list *result = NULL, *p; > + struct rev_info revs; > + unsigned int i, nr; > + > + /* > + * SHALLOW (excluded) and NOT_SHALLOW (included) should not be > + * set at this point. But better be safe than sorry. > + */ > + nr = get_max_object_index(); > + for (i = 0; i < nr; i++) { > + struct object *o = get_indexed_object(i); > + if (!o || o->type != OBJ_COMMIT) > + continue; > + o->flags &= ~(shallow_flag | not_shallow_flag); > + } This is slightly different from clear_object_flags(), but I cannot tell if it is intended, or if you forgot that the function exists. > + is_repository_shallow(); /* make sure shallows are read */ > + > + init_revisions(&revs, NULL); > + save_commit_buffer = 0; > + setup_revisions(ac, av, &revs, NULL); > + > + /* Mark all reachable commits as NOT_SHALLOW */ > + if (prepare_revision_walk(&revs)) > + die("revision walk setup failed"); > + traverse_commit_list(&revs, show_commit, NULL, ¬_shallow_flag); > + > + /* > + * mark border commits SHALLOW + NOT_SHALLOW. > + * We cannot clear NOT_SHALLOW right now. Imagine border > + * commit A is processed first, then commit B, whose parent is > + * A, later. If NOT_SHALLOW on A is cleared at step 1, B > + * itself is considered border at step 2, which is incorrect. > + */ > + nr = get_max_object_index(); > + for (i = 0; i < nr; i++) { I'd really like not to see a loop over 0..get_max_object_index(). Are there many codepaths that peek into the in-core entire object store already? Would it work equally well to keep track of the commits discovered in show_commit() to use as the set of commits you need to visit in this second pass? > + struct object *o = get_indexed_object(i); > + struct commit *c = (struct commit *)o; > + > + if (!o || o->type != OBJ_COMMIT || > + !(o->flags & not_shallow_flag)) > + continue; > + > + if (parse_commit(c)) > + die("unable to parse commit %s", > + oid_to_hex(&c->object.oid)); > + > + for (p = c->parents; p; p = p->next) > + if (!(p->item->object.flags & not_shallow_flag)) { > + o->flags |= shallow_flag; > + commit_list_insert(c, &result); > + break; > + } > + } > + > + /* > + * Now we can clean up NOT_SHALLOW on border commits. Having > + * both flags set can confuse the caller. > + */ > + for (p = result; p; p = p->next) { > + struct object *ro = &p->item->object; Why "ro" only in this third pass, unlike the other two passes that said "o" which is in a sense more descriptive? > + if ((ro->flags & not_shallow_flag) && > + (ro->flags & shallow_flag)) If you introduce a "both_flags = shallow_flag | not_shallow_flag" at the very beginning, this will become if (o->flags & both_flags) o->flags &= ~not_shallow_flag; which would probably be easier to read. You can pass the same to clear_object_flags() at the first pass. > + ro->flags &= ~not_shallow_flag; > + } > + return result; > +} Other than that, this step looks quite straight-forward to me. Thanks. > + > static void check_shallow_file_for_update(void) > { > if (is_shallow == -1) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.or
[PATCH v2 14/25] shallow.c: implement a generic shallow boundary finder based on rev-list
Instead of a custom commit walker like get_shallow_commits(), this new function uses rev-list to mark NOT_SHALLOW to all reachable commits, except borders. The definition of reachable is to be defined by the protocol later. This makes it more flexible to define shallow boundary. Note: if a commit has one NOT_SHALLOW parent and one SHALLOW parent, then it's considered the boundary. Which means in the client side, this commit has _no_ parents. This could lead to surprising cuts if we're not careful. Another option is to include more commits and only mark commits whose all parents are SHALLOW as boundary. Signed-off-by: Nguyễn Thái Ngọc Duy --- commit.h | 2 ++ shallow.c | 85 +++ 2 files changed, 87 insertions(+) diff --git a/commit.h b/commit.h index 5d58be0..b717be1 100644 --- a/commit.h +++ b/commit.h @@ -258,6 +258,8 @@ extern int for_each_commit_graft(each_commit_graft_fn, void *); extern int is_repository_shallow(void); extern struct commit_list *get_shallow_commits(struct object_array *heads, int depth, int shallow_flag, int not_shallow_flag); +extern struct commit_list *get_shallow_commits_by_rev_list( + int ac, const char **av, int shallow_flag, int not_shallow_flag); extern void set_alternate_shallow_file(const char *path, int override); extern int write_shallow_commits(struct strbuf *out, int use_pack_protocol, const struct sha1_array *extra); diff --git a/shallow.c b/shallow.c index 60f1505..f5d5c1d 100644 --- a/shallow.c +++ b/shallow.c @@ -10,6 +10,8 @@ #include "diff.h" #include "revision.h" #include "commit-slab.h" +#include "revision.h" +#include "list-objects.h" static int is_shallow = -1; static struct stat_validity shallow_stat; @@ -137,6 +139,89 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth, return result; } +static void show_commit(struct commit *commit, void *data) +{ + commit->object.flags |= *(int *)data; +} + +/* + * Given rev-list arguments, run rev-list. All reachable commits + * except border ones are marked with not_shallow_flag. Border commits + * are marked with shallow_flag. The list of border/shallow commits + * are also returned. + */ +struct commit_list *get_shallow_commits_by_rev_list(int ac, const char **av, + int shallow_flag, + int not_shallow_flag) +{ + struct commit_list *result = NULL, *p; + struct rev_info revs; + unsigned int i, nr; + + /* +* SHALLOW (excluded) and NOT_SHALLOW (included) should not be +* set at this point. But better be safe than sorry. +*/ + nr = get_max_object_index(); + for (i = 0; i < nr; i++) { + struct object *o = get_indexed_object(i); + if (!o || o->type != OBJ_COMMIT) + continue; + o->flags &= ~(shallow_flag | not_shallow_flag); + } + + is_repository_shallow(); /* make sure shallows are read */ + + init_revisions(&revs, NULL); + save_commit_buffer = 0; + setup_revisions(ac, av, &revs, NULL); + + /* Mark all reachable commits as NOT_SHALLOW */ + if (prepare_revision_walk(&revs)) + die("revision walk setup failed"); + traverse_commit_list(&revs, show_commit, NULL, ¬_shallow_flag); + + /* +* mark border commits SHALLOW + NOT_SHALLOW. +* We cannot clear NOT_SHALLOW right now. Imagine border +* commit A is processed first, then commit B, whose parent is +* A, later. If NOT_SHALLOW on A is cleared at step 1, B +* itself is considered border at step 2, which is incorrect. +*/ + nr = get_max_object_index(); + for (i = 0; i < nr; i++) { + struct object *o = get_indexed_object(i); + struct commit *c = (struct commit *)o; + + if (!o || o->type != OBJ_COMMIT || + !(o->flags & not_shallow_flag)) + continue; + + if (parse_commit(c)) + die("unable to parse commit %s", + oid_to_hex(&c->object.oid)); + + for (p = c->parents; p; p = p->next) + if (!(p->item->object.flags & not_shallow_flag)) { + o->flags |= shallow_flag; + commit_list_insert(c, &result); + break; + } + } + + /* +* Now we can clean up NOT_SHALLOW on border commits. Having +* both flags set can confuse the caller. +*/ + for (p = result; p; p = p->next) { + struct object *ro = &p->item->object; + if ((ro->flags & not_shallow_flag) && + (ro->flags & shallow_flag)) +