Re: [RFC PATCH 1/2] implement fetching of moved submodules
On Thu, Aug 17, 2017 at 10:20:13AM -0700, Stefan Beller wrote: > On Thu, Aug 17, 2017 at 3:53 AM, Heiko Voigtwrote: > > We store the changed submodules paths to calculate which submodule needs > > fetching. This does not work for moved submodules since their paths do > > not stay the same in case of a moved submodules. In case of new > > submodules we do not have a path in the current checkout, since they > > just appeared in this fetch. > > > > It is more general to collect the submodule names for changes instead of > > their paths to include the above cases. > > > > With the change described above we implement 'on-demand' fetching of > > changes in moved submodules. > > This sounds as if this would also enable fetching new submodules > eventually? Yes that was the goal when starting with these changes back then. But it took more time than I had back then. So instead of letting these changes sit bitrot again lets see if we can get them integrated. For new submodules we need to change the iteration somehow. Currently we are iterating through the index. But new submodules obviously do not have an index entry (otherwise they would not be new). So instead of the index we will need to create another list that contains "all" submodules. Maybe something like: all submodules from the index plus all submodules that changed / are new? We could also go further and inspect all submodules from all ref tips to handle submodules on other branches configured to 'yes'. But I think we should leave that for later if need arises. Some merge of index and additional submodules is needed, because for --recurse-submodules=yes or submodule..fetchRecurseSubmodules=yes we always need to run fetch inside the submodule. That would break if we only looked at submodules that are collected as changed. > > Note: This does only work when repositories have a .gitmodules file. In > > other words: It breaks if we do not get a name for a repository. > > IIRC, consensus was that this is a requirement to get nice submodule > > handling these days? > > I think that should have been the consensus since ~1.7.8 (since the > submodules git dir can live inside the superprojects > /module/). I agree but since we started without it, we kind of have a mixed state. > A gitlink entry without corresponding .gitmodules entry is just a gitlink. > If we happen to have a repository at that path of the gitlink, we can > be nice and pretend like it is a functional submodule, but it really is > not. It's just another repo inside the superproject that happen to live > at the path of a gitlink. Yeah but at the moment we are handling 'on-demand' fetches and stuff for such just gitlink submodules. If we were firm on that requirement we would just skip those but that is not the case with the current implementation. > > Signed-off-by: Heiko Voigt > > --- > > > > I updated the leftover code from my series implementing recursive fetch > > for moved submodules[1] to the current master. > > > > This breaks t5531 and t5545 because they do not use a .gitmodules file. > > > > I also have some code leftover that does fallback on paths in case no > > submodule names can be found. But I do not really like it. The question > > here is how far do we support not using .gitmodules. Is it e.g. > > reasonable to say: "For --recurse-submodules=on-demand you need a > > .gitmodules file?" > > I would not intentionally break users here, but any new functionality can > safely assume (a) we have a proper .gitmodules entry or (b) it is not a > submodule, so do nothing/be extra careful. > > For example in recursive diff sort of makes sense to also handle > non-submodule gitlinks, but fetch is harder to tell. Well we have a few different cases for gitlinks without .gitmodule entry: 1. New gitlink: We can not handle since we do not know where to clone from. 2. Removed gitlink: No need to do anything in fetch 3. Changed (but same name) gitlink: We can / and currently do run fetch in it 4. Renamed: We currently skip those. We could probably do something to track the rename and run fetch in case of gitlink changes. In my current approach only the ones with a name are handled. So I guess I will add a fallback to paths for 3. so we do not unnecessarily break users using the current implementation. > (just last night I was rereading > https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/ > which I think is a super cute application of gitlinks. If you happen > to checkout such > a tree, you don't want to fetch all of the fake submodules) > > > > > [1] > > https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/ > > Oha, that is from way back in the time. :) Yeah this code did go through some proper bitrotting :) > > submodule.c | 92 >
Re: [RFC PATCH 1/2] implement fetching of moved submodules
On Thu, Aug 17, 2017 at 3:53 AM, Heiko Voigtwrote: > We store the changed submodules paths to calculate which submodule needs > fetching. This does not work for moved submodules since their paths do > not stay the same in case of a moved submodules. In case of new > submodules we do not have a path in the current checkout, since they > just appeared in this fetch. > > It is more general to collect the submodule names for changes instead of > their paths to include the above cases. > > With the change described above we implement 'on-demand' fetching of > changes in moved submodules. This sounds as if this would also enable fetching new submodules eventually? > Note: This does only work when repositories have a .gitmodules file. In > other words: It breaks if we do not get a name for a repository. > IIRC, consensus was that this is a requirement to get nice submodule > handling these days? I think that should have been the consensus since ~1.7.8 (since the submodules git dir can live inside the superprojects /module/). A gitlink entry without corresponding .gitmodules entry is just a gitlink. If we happen to have a repository at that path of the gitlink, we can be nice and pretend like it is a functional submodule, but it really is not. It's just another repo inside the superproject that happen to live at the path of a gitlink. > Signed-off-by: Heiko Voigt > --- > > I updated the leftover code from my series implementing recursive fetch > for moved submodules[1] to the current master. > > This breaks t5531 and t5545 because they do not use a .gitmodules file. > > I also have some code leftover that does fallback on paths in case no > submodule names can be found. But I do not really like it. The question > here is how far do we support not using .gitmodules. Is it e.g. > reasonable to say: "For --recurse-submodules=on-demand you need a > .gitmodules file?" I would not intentionally break users here, but any new functionality can safely assume (a) we have a proper .gitmodules entry or (b) it is not a submodule, so do nothing/be extra careful. For example in recursive diff sort of makes sense to also handle non-submodule gitlinks, but fetch is harder to tell. (just last night I was rereading https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/ which I think is a super cute application of gitlinks. If you happen to checkout such a tree, you don't want to fetch all of the fake submodules) > > [1] > https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/ Oha, that is from way back in the time. :) > submodule.c | 92 > + > t/t5526-fetch-submodules.sh | 35 + > 2 files changed, 86 insertions(+), 41 deletions(-) > > diff --git a/submodule.c b/submodule.c > index 27de65a..3ed78ac 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -23,7 +23,7 @@ > static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND; > static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF; > static int parallel_jobs = 1; > -static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP; > +static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP; > static int initialized_fetch_ref_tips; > static struct oid_array ref_tips_before_fetch; > static struct oid_array ref_tips_after_fetch; > @@ -742,11 +742,11 @@ const struct submodule *submodule_from_ce(const struct > cache_entry *ce) > } > > static struct oid_array *submodule_commits(struct string_list *submodules, > - const char *path) > + const char *name) > { > struct string_list_item *item; > > - item = string_list_insert(submodules, path); > + item = string_list_insert(submodules, name); > if (item->util) > return (struct oid_array *) item->util; > > @@ -755,39 +755,34 @@ static struct oid_array *submodule_commits(struct > string_list *submodules, > return (struct oid_array *) item->util; > } > > +struct collect_changed_submodules_cb_data { > + struct string_list *changed; Here a comment would be helpful or a more concise variable name. (What is changed?) > + const struct object_id *commit_oid; > +}; > + > static void collect_changed_submodules_cb(struct diff_queue_struct *q, > struct diff_options *options, > void *data) > { > + struct collect_changed_submodules_cb_data *me = data; > + struct string_list *changed = me->changed; > + const struct object_id *commit_oid = me->commit_oid; > int i; > - struct string_list *changed = data; > > for (i = 0; i < q->nr; i++) { > struct diff_filepair *p = q->queue[i]; >
[RFC PATCH 1/2] implement fetching of moved submodules
We store the changed submodules paths to calculate which submodule needs fetching. This does not work for moved submodules since their paths do not stay the same in case of a moved submodules. In case of new submodules we do not have a path in the current checkout, since they just appeared in this fetch. It is more general to collect the submodule names for changes instead of their paths to include the above cases. With the change described above we implement 'on-demand' fetching of changes in moved submodules. Note: This does only work when repositories have a .gitmodules file. In other words: It breaks if we do not get a name for a repository. IIRC, consensus was that this is a requirement to get nice submodule handling these days? Signed-off-by: Heiko Voigt--- I updated the leftover code from my series implementing recursive fetch for moved submodules[1] to the current master. This breaks t5531 and t5545 because they do not use a .gitmodules file. I also have some code leftover that does fallback on paths in case no submodule names can be found. But I do not really like it. The question here is how far do we support not using .gitmodules. Is it e.g. reasonable to say: "For --recurse-submodules=on-demand you need a .gitmodules file?" [1] https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/ submodule.c | 92 + t/t5526-fetch-submodules.sh | 35 + 2 files changed, 86 insertions(+), 41 deletions(-) diff --git a/submodule.c b/submodule.c index 27de65a..3ed78ac 100644 --- a/submodule.c +++ b/submodule.c @@ -23,7 +23,7 @@ static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND; static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF; static int parallel_jobs = 1; -static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP; +static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP; static int initialized_fetch_ref_tips; static struct oid_array ref_tips_before_fetch; static struct oid_array ref_tips_after_fetch; @@ -742,11 +742,11 @@ const struct submodule *submodule_from_ce(const struct cache_entry *ce) } static struct oid_array *submodule_commits(struct string_list *submodules, - const char *path) + const char *name) { struct string_list_item *item; - item = string_list_insert(submodules, path); + item = string_list_insert(submodules, name); if (item->util) return (struct oid_array *) item->util; @@ -755,39 +755,34 @@ static struct oid_array *submodule_commits(struct string_list *submodules, return (struct oid_array *) item->util; } +struct collect_changed_submodules_cb_data { + struct string_list *changed; + const struct object_id *commit_oid; +}; + static void collect_changed_submodules_cb(struct diff_queue_struct *q, struct diff_options *options, void *data) { + struct collect_changed_submodules_cb_data *me = data; + struct string_list *changed = me->changed; + const struct object_id *commit_oid = me->commit_oid; int i; - struct string_list *changed = data; for (i = 0; i < q->nr; i++) { struct diff_filepair *p = q->queue[i]; struct oid_array *commits; + const struct submodule *submodule; + if (!S_ISGITLINK(p->two->mode)) continue; - if (S_ISGITLINK(p->one->mode)) { - /* -* NEEDSWORK: We should honor the name configured in -* the .gitmodules file of the commit we are examining -* here to be able to correctly follow submodules -* being moved around. -*/ - commits = submodule_commits(changed, p->two->path); - oid_array_append(commits, >two->oid); - } else { - /* Submodule is new or was moved here */ - /* -* NEEDSWORK: When the .git directories of submodules -* live inside the superprojects .git directory some -* day we should fetch new submodules directly into -* that location too when config or options request -* that so they can be checked out from there. -*/ + submodule = submodule_from_path(commit_oid, p->two->path); + if (!submodule) continue; - } + + commits = submodule_commits(changed, submodule->name); +