Re: [RFC PATCH 1/2] implement fetching of moved submodules

2017-08-18 Thread Heiko Voigt
On Thu, Aug 17, 2017 at 10:20:13AM -0700, Stefan Beller wrote:
> On Thu, Aug 17, 2017 at 3:53 AM, Heiko Voigt  wrote:
> > We store the changed submodules paths to calculate which submodule needs
> > fetching. This does not work for moved submodules since their paths do
> > not stay the same in case of a moved submodules. In case of new
> > submodules we do not have a path in the current checkout, since they
> > just appeared in this fetch.
> >
> > It is more general to collect the submodule names for changes instead of
> > their paths to include the above cases.
> >
> > With the change described above we implement 'on-demand' fetching of
> > changes in moved submodules.
> 
> This sounds as if this would also enable fetching new submodules
> eventually?

Yes that was the goal when starting with these changes back then. But it
took more time than I had back then. So instead of letting these changes
sit bitrot again lets see if we can get them integrated.

For new submodules we need to change the iteration somehow. Currently we
are iterating through the index. But new submodules obviously do not
have an index entry (otherwise they would not be new). So instead of the
index we will need to create another list that contains "all"
submodules. Maybe something like: all submodules from the index plus all
submodules that changed / are new? We could also go further and inspect
all submodules from all ref tips to handle submodules on other branches
configured to 'yes'. But I think we should leave that for later if need
arises.

Some merge of index and additional submodules is needed, because for
--recurse-submodules=yes or submodule..fetchRecurseSubmodules=yes
we always need to run fetch inside the submodule. That would break if we
only looked at submodules that are collected as changed.

> > Note: This does only work when repositories have a .gitmodules file. In
> > other words: It breaks if we do not get a name for a repository.
> > IIRC, consensus was that this is a requirement to get nice submodule
> > handling these days?
> 
> I think that should have been the consensus since ~1.7.8 (since the
> submodules git dir can live inside the superprojects
> /module/).

I agree but since we started without it, we kind of have a mixed state.

> A gitlink entry without corresponding .gitmodules entry is just a gitlink.
> If we happen to have a repository at that path of the gitlink, we can
> be nice and pretend like it is a functional submodule, but it really is
> not. It's just another repo inside the superproject that happen to live
> at the path of a gitlink.

Yeah but at the moment we are handling 'on-demand' fetches and stuff for
such just gitlink submodules. If we were firm on that requirement we
would just skip those but that is not the case with the current
implementation.

> > Signed-off-by: Heiko Voigt 
> > ---
> >
> > I updated the leftover code from my series implementing recursive fetch
> > for moved submodules[1] to the current master.
> >
> > This breaks t5531 and t5545 because they do not use a .gitmodules file.
> >
> > I also have some code leftover that does fallback on paths in case no
> > submodule names can be found. But I do not really like it. The question
> > here is how far do we support not using .gitmodules. Is it e.g.
> > reasonable to say: "For --recurse-submodules=on-demand you need a
> > .gitmodules file?"
> 
> I would not intentionally break users here, but any new functionality can
> safely assume (a) we have a proper .gitmodules entry or (b) it is not a
> submodule, so do nothing/be extra careful.
> 
> For example in recursive diff sort of makes sense to also handle
> non-submodule gitlinks, but fetch is harder to tell.

Well we have a few different cases for gitlinks without .gitmodule
entry:

 1. New gitlink: We can not handle since we do not know where to clone
 from.

 2. Removed gitlink: No need to do anything in fetch

 3. Changed (but same name) gitlink: We can / and currently do run fetch
 in it

 4. Renamed: We currently skip those. We could probably do something to
 track the rename and run fetch in case of gitlink changes.
 In my current approach only the ones with a name are
 handled.

So I guess I will add a fallback to paths for 3. so we do not
unnecessarily break users using the current implementation.

> (just last night I was rereading
> https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/
> which I think is a super cute application of gitlinks. If you happen
> to checkout such
> a tree, you don't want to fetch all of the fake submodules)
> 
> >
> > [1] 
> > https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/
> 
> Oha, that is from way back in the time. :)

Yeah this code did go through some proper bitrotting :)

> >  submodule.c | 92 
> 

Re: [RFC PATCH 1/2] implement fetching of moved submodules

2017-08-17 Thread Stefan Beller
On Thu, Aug 17, 2017 at 3:53 AM, Heiko Voigt  wrote:
> We store the changed submodules paths to calculate which submodule needs
> fetching. This does not work for moved submodules since their paths do
> not stay the same in case of a moved submodules. In case of new
> submodules we do not have a path in the current checkout, since they
> just appeared in this fetch.
>
> It is more general to collect the submodule names for changes instead of
> their paths to include the above cases.
>
> With the change described above we implement 'on-demand' fetching of
> changes in moved submodules.

This sounds as if this would also enable fetching new submodules
eventually?

> Note: This does only work when repositories have a .gitmodules file. In
> other words: It breaks if we do not get a name for a repository.
> IIRC, consensus was that this is a requirement to get nice submodule
> handling these days?

I think that should have been the consensus since ~1.7.8 (since the
submodules git dir can live inside the superprojects
/module/).

A gitlink entry without corresponding .gitmodules entry is just a gitlink.
If we happen to have a repository at that path of the gitlink, we can
be nice and pretend like it is a functional submodule, but it really is
not. It's just another repo inside the superproject that happen to live
at the path of a gitlink.

> Signed-off-by: Heiko Voigt 
> ---
>
> I updated the leftover code from my series implementing recursive fetch
> for moved submodules[1] to the current master.
>
> This breaks t5531 and t5545 because they do not use a .gitmodules file.
>
> I also have some code leftover that does fallback on paths in case no
> submodule names can be found. But I do not really like it. The question
> here is how far do we support not using .gitmodules. Is it e.g.
> reasonable to say: "For --recurse-submodules=on-demand you need a
> .gitmodules file?"

I would not intentionally break users here, but any new functionality can
safely assume (a) we have a proper .gitmodules entry or (b) it is not a
submodule, so do nothing/be extra careful.

For example in recursive diff sort of makes sense to also handle
non-submodule gitlinks, but fetch is harder to tell.

(just last night I was rereading
https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/
which I think is a super cute application of gitlinks. If you happen
to checkout such
a tree, you don't want to fetch all of the fake submodules)

>
> [1] 
> https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/

Oha, that is from way back in the time. :)

>  submodule.c | 92 
> +
>  t/t5526-fetch-submodules.sh | 35 +
>  2 files changed, 86 insertions(+), 41 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 27de65a..3ed78ac 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -23,7 +23,7 @@
>  static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
>  static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
>  static int parallel_jobs = 1;
> -static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
> +static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
>  static int initialized_fetch_ref_tips;
>  static struct oid_array ref_tips_before_fetch;
>  static struct oid_array ref_tips_after_fetch;
> @@ -742,11 +742,11 @@ const struct submodule *submodule_from_ce(const struct 
> cache_entry *ce)
>  }
>
>  static struct oid_array *submodule_commits(struct string_list *submodules,
> -  const char *path)
> +  const char *name)
>  {
> struct string_list_item *item;
>
> -   item = string_list_insert(submodules, path);
> +   item = string_list_insert(submodules, name);
> if (item->util)
> return (struct oid_array *) item->util;
>
> @@ -755,39 +755,34 @@ static struct oid_array *submodule_commits(struct 
> string_list *submodules,
> return (struct oid_array *) item->util;
>  }
>
> +struct collect_changed_submodules_cb_data {
> +   struct string_list *changed;

Here a comment would be helpful or a more concise variable name.
(What is changed?)

> +   const struct object_id *commit_oid;
> +};
> +
>  static void collect_changed_submodules_cb(struct diff_queue_struct *q,
>   struct diff_options *options,
>   void *data)
>  {
> +   struct collect_changed_submodules_cb_data *me = data;
> +   struct string_list *changed = me->changed;
> +   const struct object_id *commit_oid = me->commit_oid;
> int i;
> -   struct string_list *changed = data;
>
> for (i = 0; i < q->nr; i++) {
> struct diff_filepair *p = q->queue[i];
> 

[RFC PATCH 1/2] implement fetching of moved submodules

2017-08-17 Thread Heiko Voigt
We store the changed submodules paths to calculate which submodule needs
fetching. This does not work for moved submodules since their paths do
not stay the same in case of a moved submodules. In case of new
submodules we do not have a path in the current checkout, since they
just appeared in this fetch.

It is more general to collect the submodule names for changes instead of
their paths to include the above cases.

With the change described above we implement 'on-demand' fetching of
changes in moved submodules.

Note: This does only work when repositories have a .gitmodules file. In
other words: It breaks if we do not get a name for a repository.
IIRC, consensus was that this is a requirement to get nice submodule
handling these days?

Signed-off-by: Heiko Voigt 
---

I updated the leftover code from my series implementing recursive fetch
for moved submodules[1] to the current master.

This breaks t5531 and t5545 because they do not use a .gitmodules file.

I also have some code leftover that does fallback on paths in case no
submodule names can be found. But I do not really like it. The question
here is how far do we support not using .gitmodules. Is it e.g.
reasonable to say: "For --recurse-submodules=on-demand you need a
.gitmodules file?"

[1] 
https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/

 submodule.c | 92 +
 t/t5526-fetch-submodules.sh | 35 +
 2 files changed, 86 insertions(+), 41 deletions(-)

diff --git a/submodule.c b/submodule.c
index 27de65a..3ed78ac 100644
--- a/submodule.c
+++ b/submodule.c
@@ -23,7 +23,7 @@
 static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
 static int parallel_jobs = 1;
-static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
+static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -742,11 +742,11 @@ const struct submodule *submodule_from_ce(const struct 
cache_entry *ce)
 }
 
 static struct oid_array *submodule_commits(struct string_list *submodules,
-  const char *path)
+  const char *name)
 {
struct string_list_item *item;
 
-   item = string_list_insert(submodules, path);
+   item = string_list_insert(submodules, name);
if (item->util)
return (struct oid_array *) item->util;
 
@@ -755,39 +755,34 @@ static struct oid_array *submodule_commits(struct 
string_list *submodules,
return (struct oid_array *) item->util;
 }
 
+struct collect_changed_submodules_cb_data {
+   struct string_list *changed;
+   const struct object_id *commit_oid;
+};
+
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
  struct diff_options *options,
  void *data)
 {
+   struct collect_changed_submodules_cb_data *me = data;
+   struct string_list *changed = me->changed;
+   const struct object_id *commit_oid = me->commit_oid;
int i;
-   struct string_list *changed = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
struct oid_array *commits;
+   const struct submodule *submodule;
+
if (!S_ISGITLINK(p->two->mode))
continue;
 
-   if (S_ISGITLINK(p->one->mode)) {
-   /*
-* NEEDSWORK: We should honor the name configured in
-* the .gitmodules file of the commit we are examining
-* here to be able to correctly follow submodules
-* being moved around.
-*/
-   commits = submodule_commits(changed, p->two->path);
-   oid_array_append(commits, >two->oid);
-   } else {
-   /* Submodule is new or was moved here */
-   /*
-* NEEDSWORK: When the .git directories of submodules
-* live inside the superprojects .git directory some
-* day we should fetch new submodules directly into
-* that location too when config or options request
-* that so they can be checked out from there.
-*/
+   submodule = submodule_from_path(commit_oid, p->two->path);
+   if (!submodule)
continue;
-   }
+
+   commits = submodule_commits(changed, submodule->name);
+