Re: [PATCH 09/10] fetch: try fetching submodules if needed objects were not fetched
On Fri, Oct 26, 2018 at 1:41 PM Jonathan Tan wrote: > > > But this default fetch is not sufficient, as a newly fetched commit in > > the superproject could point to a commit in the submodule that is not > > in the default refspec. This is common in workflows like Gerrit's. > > When fetching a Gerrit change under review (from refs/changes/??), the > > commits in that change likely point to submodule commits that have not > > been merged to a branch yet. > > > > Try fetching a submodule by object id if the object id that the > > superproject points to, cannot be found. > > I see that these suggestions of mine (from [1]) was implemented, but not > others. If you disagree, that's fine, but I think they should be > discussed. ok. > > > - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) && > > - (recurse_submodules != RECURSE_SUBMODULES_ON)) > > + if (recurse_submodules != RECURSE_SUBMODULES_OFF) > > I think the next patch should be squashed into this patch. Then you can > say that these are now redundant and can be removed. ok. > > > @@ -1218,8 +1218,12 @@ struct submodule_parallel_fetch { > > int result; > > > > struct string_list changed_submodule_names; > > + struct get_next_submodule_task **fetch_specific_oids; > > + int fetch_specific_oids_nr, fetch_specific_oids_alloc; > > }; > > Add documentation for fetch_specific_oids. Also, it might be better to > call these oid_fetch_tasks and the struct, "struct fetch_task". ok. > Here, struct get_next_submodule_task is used for 2 different things: > (1) After the first round fetch, fetch_finish() uses it to determine if > a second round is needed. > (2) In submodule_parallel_fetch.fetch_specific_oids, to tell the > parallel runner (through get_next_submodule_task()) to do this > fetch. > > I think that it's better to have 2 different structs for these 2 > different uses. (Note that task_cb can be NULL for the second round. > Unless we plan to recheck the OIDs? Currently we recheck them, but we > don't do anything either way.) I think it is easier to only have one struct until we have substantially more to communicate. (1) is a boolean answer, for which (non-)NULL is sufficient. > I think that this is best described as the submodule that has no entry > in .gitmodules? Maybe call it "get_non_gitmodules_submodule" and > document it with a similar comment as in get_submodule_repo_for(). done. > > > + > > +static struct get_next_submodule_task *get_next_submodule_task_create( > > + struct repository *r, const char *path) > > +{ > > + struct get_next_submodule_task *task = xmalloc(sizeof(*task)); > > + memset(task, 0, sizeof(*task)); > > + > > + task->sub = submodule_from_path(r, _oid, path); > > + if (!task->sub) { > > + task->sub = get_default_submodule(path); > > + task->free_sub = 1; > > + } > > + > > + return task; > > +} > > Clearer to me would be to make get_next_submodule_task_create() return > NULL if submodule_from_path() returns NULL. I doubled down on this one and return NULL when get_default_submodule (now renamed to get_non_gitmodules_submodule) returns NULL, as then we can move the free() from get_next_submodule here and there we'll just have task = fetch_task_create(spf->r, ce->name); if (!task) continue; which helps get_next_submodule to stay readable. > Same comment about "on-demand" as in my previous e-mail. I'd want to push back on refactoring and defer that to a later series. > Break lines to 80. [...] > Same comment about "s--h" as in my previous e-mail. done > > + /* Are there commits that do not exist? */ > > + if (commits->nr) { > > + /* We already tried fetching them, do not try again. */ > > + if (task->commits) > > + return 0; > > Same comment about "task->commits" as in my previous e-mail. Good call. I reordered the function read easier and added a comment on any early return how it could happen. > > > diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh > > index 6c2f9b2ba2..5a75b57852 100755 > > One more thing to test is the case where a submodule doesn't have a > .gitmodules entry. added a test. I just resent the series. Stefan
Re: [PATCH 09/10] fetch: try fetching submodules if needed objects were not fetched
> But this default fetch is not sufficient, as a newly fetched commit in > the superproject could point to a commit in the submodule that is not > in the default refspec. This is common in workflows like Gerrit's. > When fetching a Gerrit change under review (from refs/changes/??), the > commits in that change likely point to submodule commits that have not > been merged to a branch yet. > > Try fetching a submodule by object id if the object id that the > superproject points to, cannot be found. I see that these suggestions of mine (from [1]) was implemented, but not others. If you disagree, that's fine, but I think they should be discussed. [1] https://public-inbox.org/git/20181018003954.139498-1-jonathanta...@google.com/ > The try does not happen when the "git fetch" done at the > superproject is not storing the fetched results in remote > tracking branches (i.e. instead just recording them to > FETCH_HEAD) in this step. A later patch will fix this. E.g. here, I said that there was no remote tracking branch involved. > - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) && > - (recurse_submodules != RECURSE_SUBMODULES_ON)) > + if (recurse_submodules != RECURSE_SUBMODULES_OFF) I think the next patch should be squashed into this patch. Then you can say that these are now redundant and can be removed. > @@ -1218,8 +1218,12 @@ struct submodule_parallel_fetch { > int result; > > struct string_list changed_submodule_names; > + struct get_next_submodule_task **fetch_specific_oids; > + int fetch_specific_oids_nr, fetch_specific_oids_alloc; > }; Add documentation for fetch_specific_oids. Also, it might be better to call these oid_fetch_tasks and the struct, "struct fetch_task". Here, struct get_next_submodule_task is used for 2 different things: (1) After the first round fetch, fetch_finish() uses it to determine if a second round is needed. (2) In submodule_parallel_fetch.fetch_specific_oids, to tell the parallel runner (through get_next_submodule_task()) to do this fetch. I think that it's better to have 2 different structs for these 2 different uses. (Note that task_cb can be NULL for the second round. Unless we plan to recheck the OIDs? Currently we recheck them, but we don't do anything either way.) > +static const struct submodule *get_default_submodule(const char *path) > +{ > + struct submodule *ret = NULL; > + const char *name = default_name_or_path(path); > + > + if (!name) > + return NULL; > + > + ret = xmalloc(sizeof(*ret)); > + memset(ret, 0, sizeof(*ret)); > + ret->path = name; > + ret->name = name; > + > + return (const struct submodule *) ret; > +} I think that this is best described as the submodule that has no entry in .gitmodules? Maybe call it "get_non_gitmodules_submodule" and document it with a similar comment as in get_submodule_repo_for(). > + > +static struct get_next_submodule_task *get_next_submodule_task_create( > + struct repository *r, const char *path) > +{ > + struct get_next_submodule_task *task = xmalloc(sizeof(*task)); > + memset(task, 0, sizeof(*task)); > + > + task->sub = submodule_from_path(r, _oid, path); > + if (!task->sub) { > + task->sub = get_default_submodule(path); > + task->free_sub = 1; > + } > + > + return task; > +} Clearer to me would be to make get_next_submodule_task_create() return NULL if submodule_from_path() returns NULL. > + if (spf->fetch_specific_oids_nr) { > + struct get_next_submodule_task *task = > spf->fetch_specific_oids[spf->fetch_specific_oids_nr - 1]; Break lines to 80. > + argv_array_pushv(>args, spf->args.argv); > + argv_array_push(>args, "on-demand"); Same comment about "on-demand" as in my previous e-mail. > + argv_array_push(>args, "--submodule-prefix"); > + argv_array_push(>args, submodule_prefix.buf); > + > + /* NEEDSWORK: have get_default_remote from s--h */ Same comment about "s--h" as in my previous e-mail. > + commits = it->util; > + oid_array_filter(commits, > + commit_exists_in_sub, > + task->repo); > + > + /* Are there commits that do not exist? */ > + if (commits->nr) { > + /* We already tried fetching them, do not try again. */ > + if (task->commits) > + return 0; Same comment about "task->commits" as in my previous e-mail. > diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh > index 6c2f9b2ba2..5a75b57852 100755 One more thing to test is the case where a submodule doesn't have a .gitmodules entry.
[PATCH 09/10] fetch: try fetching submodules if needed objects were not fetched
Currently when git-fetch is asked to recurse into submodules, it dispatches a plain "git-fetch -C " (with some submodule related options such as prefix and recusing strategy, but) without any information of the remote or the tip that should be fetched. But this default fetch is not sufficient, as a newly fetched commit in the superproject could point to a commit in the submodule that is not in the default refspec. This is common in workflows like Gerrit's. When fetching a Gerrit change under review (from refs/changes/??), the commits in that change likely point to submodule commits that have not been merged to a branch yet. Try fetching a submodule by object id if the object id that the superproject points to, cannot be found. The try does not happen when the "git fetch" done at the superproject is not storing the fetched results in remote tracking branches (i.e. instead just recording them to FETCH_HEAD) in this step. A later patch will fix this. builtin/fetch used to only inspect submodules when they were fetched "on-demand", as in either on/off case it was clear whether the submodule needs to be fetched. However to know whether we need to try fetching the object ids, we need to identify the object names, which is done in this function check_for_new_submodule_commits(), so we'll also run that code in case the submodule recursion is set to "on". Signed-off-by: Stefan Beller --- builtin/fetch.c | 9 +- submodule.c | 192 ++-- t/t5526-fetch-submodules.sh | 31 ++ 3 files changed, 198 insertions(+), 34 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 61bec5d213..95c44bf6ff 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -700,8 +700,7 @@ static int update_local_ref(struct ref *ref, what = _("[new ref]"); } - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) && - (recurse_submodules != RECURSE_SUBMODULES_ON)) + if (recurse_submodules != RECURSE_SUBMODULES_OFF) check_for_new_submodule_commits(>new_oid); r = s_update_ref(msg, ref, 0); format_display(display, r ? '!' : '*', what, @@ -716,8 +715,7 @@ static int update_local_ref(struct ref *ref, strbuf_add_unique_abbrev(, >object.oid, DEFAULT_ABBREV); strbuf_addstr(, ".."); strbuf_add_unique_abbrev(, >new_oid, DEFAULT_ABBREV); - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) && - (recurse_submodules != RECURSE_SUBMODULES_ON)) + if (recurse_submodules != RECURSE_SUBMODULES_OFF) check_for_new_submodule_commits(>new_oid); r = s_update_ref("fast-forward", ref, 1); format_display(display, r ? '!' : ' ', quickref.buf, @@ -731,8 +729,7 @@ static int update_local_ref(struct ref *ref, strbuf_add_unique_abbrev(, >object.oid, DEFAULT_ABBREV); strbuf_addstr(, "..."); strbuf_add_unique_abbrev(, >new_oid, DEFAULT_ABBREV); - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) && - (recurse_submodules != RECURSE_SUBMODULES_ON)) + if (recurse_submodules != RECURSE_SUBMODULES_OFF) check_for_new_submodule_commits(>new_oid); r = s_update_ref("forced-update", ref, 1); format_display(display, r ? '!' : '+', quickref.buf, diff --git a/submodule.c b/submodule.c index 67813fbe78..c978a38c81 100644 --- a/submodule.c +++ b/submodule.c @@ -1218,8 +1218,12 @@ struct submodule_parallel_fetch { int result; struct string_list changed_submodule_names; + struct get_next_submodule_task **fetch_specific_oids; + int fetch_specific_oids_nr, fetch_specific_oids_alloc; }; -#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP } +#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \ + STRING_LIST_INIT_DUP, \ + NULL, 0, 0} static int get_fetch_recurse_config(const struct submodule *submodule, struct submodule_parallel_fetch *spf) @@ -1246,6 +1250,58 @@ static int get_fetch_recurse_config(const struct submodule *submodule, return spf->default_option; } +struct get_next_submodule_task { + struct repository *repo; + const struct submodule *sub; + unsigned free_sub : 1; /* Do we need to free the submodule? */ + + /* fetch specific oids if set, otherwise fetch default refspec */ + struct oid_array *commits; +}; + +static const struct submodule *get_default_submodule(const char *path) +{ + struct submodule *ret = NULL; + const char *name = default_name_or_path(path); + + if (!name) + return NULL; + + ret = xmalloc(sizeof(*ret)); + memset(ret, 0,