Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-10-23 Thread Stefan Beller
On Tue, Oct 23, 2018 at 3:55 PM Jonathan Tan  wrote:
> > When adding the comment here, we'd also want to have
> > the comment in prepare_submodule_repo_env, which
> > could be its own preparation commit.
>
> I agree with the protection. As for the preparation commit, I don't
> think it's always the code author's responsibility to tidy up the
> surrounding code, but since you're adding an identical comment here,
> it's probably worth it to add the comment there too.

Am I the only one who dislikes inconsistent files? ;-)
(ie. clean in one place, not cleaned up in another)
I can see your point. Will add a comment

> > Thinking of that, maybe we need to announce that in get_next_submodule
>
> The consequence of getting caught changes, though. Currently,
> spf->result is set to 1 whenever a child process fails. But in this
> patch, some of these repositories would be entirely skipped, meaning
> that no child process is run, and spf->result is never modified.

Right.

> > If the working tree directory is empty for that submodule, it means
> > it is likely not initialized. But why would we use that as a signal to
> > skip the submodule?
>
> What I meant was: if empty, skip it completely. Otherwise, do the
> repo_submodule_init() and repo_init() thing, and if they both fail, set
> spf->result to 1, preserving existing behavior.

I did it the other way round:

If repo_[submodule_]init fails, see if we have a gitlink in tree and
an empty dir in the FS, to decide if we need to signal failure.

I can switch it around again, but it seemed easier to write as
that puts corner cases away into one else {} case.


Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-10-23 Thread Jonathan Tan
> > Why does GIT_DIR need to be set? Is it to avoid subcommands recursively
> > checking the parent directories in case the CWD is a malformed Git
> > repository? If yes, maybe it's worth adding a comment.
> 
> It is copying the structure from prepare_submodule_repo_env,
> specifically 10f5c52656 (submodule: avoid auto-discovery in
> prepare_submodule_repo_env(), 2016-09-01), which sounds
> appealing (and brings real benefits for the working directory),
> but I have not thought about this protection for the git dir.
> 
> Maybe another approach is to not set the cwd for the child process
> and instead point GIT_DIR_ENVIRONMENT only to the right
> directory.
> 
> Then the use of GIT_DIR_ENVIRONMENT is obvious and
> is not just for protection of corner cases.
> 
> However I think this protection is really valuable for the
> .git dir as well as the submodule may be broken and we do not
> want to end up in an infinite loop (as the discovery would find
> the superproject which then tries to recurse, again, into the
> submodule with the broken git dir)
> 
> When adding the comment here, we'd also want to have
> the comment in prepare_submodule_repo_env, which
> could be its own preparation commit.

I agree with the protection. As for the preparation commit, I don't
think it's always the code author's responsibility to tidy up the
surrounding code, but since you're adding an identical comment here,
it's probably worth it to add the comment there too.

> > This is the significant thing that this patch does more - an unskipped
> > submodule is now something that either passes the checks in
> > repo_submodule_init() or the checks in repo_init(), which seems to be
> > stricter than the current check that ".git" points to a directory or is
> > one. This means that we skip certain broken repositories, and this
> > necessitates a change in the test.
> 
> I see. However there is no change in function, the check in repo_init
> (or repo_submodule_init) is less strict than the check in the child process.
> So if there are broken submodule repositories, the difference of this
> patch is the layer at which it is caught, i.e. we would not spawn a child
> that fails, but skip the submodule.
> 
> Thinking of that, maybe we need to announce that in get_next_submodule

The consequence of getting caught changes, though. Currently,
spf->result is set to 1 whenever a child process fails. But in this
patch, some of these repositories would be entirely skipped, meaning
that no child process is run, and spf->result is never modified.

> > I think we should be more particular about what we're allowed to skip -
> > in particular, maybe if we're planning to skip this submodule, its
> > corresponding directory in the worktree (if one exists) needs to be
> > empty.
> 
> If the working tree directory is empty for that submodule, it means
> it is likely not initialized. But why would we use that as a signal to
> skip the submodule?

What I meant was: if empty, skip it completely. Otherwise, do the
repo_submodule_init() and repo_init() thing, and if they both fail, set
spf->result to 1, preserving existing behavior.


Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-10-23 Thread Stefan Beller
On Wed, Oct 17, 2018 at 3:58 PM Jonathan Tan  wrote:
>
> > This patch started as a refactoring to make 'get_next_submodule' more
> > readable, but upon doing so, I realized that "git fetch" of the submodule
> > actually doesn't need to be run in the submodules worktree. So let's run
> > it in its git dir instead.
>
> The commit message needs to be updated, I think - this patch does
> significantly more than fetching in the gitdir.

>From my point of view, it is not significant, but refactoring.
I'll think how to write a better commit message.

> > This patch leaks the cp->dir in get_next_submodule, as any further
> > callback in run_processes_parallel doesn't have access to the child
> > process any more.
>
> The cp->dir is already leaked - probably better to write "cp->dir in
> get_next_submodule() is still leaked, but this will be fixed in a
> subsequent patch".

... which fails to mention the reason why (as it is hard to do given
the current design) but is more concise.

> > +static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
> > +{
> > + prepare_submodule_repo_env_no_git_dir(out);
> > + argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
>
> Why does GIT_DIR need to be set? Is it to avoid subcommands recursively
> checking the parent directories in case the CWD is a malformed Git
> repository? If yes, maybe it's worth adding a comment.

It is copying the structure from prepare_submodule_repo_env,
specifically 10f5c52656 (submodule: avoid auto-discovery in
prepare_submodule_repo_env(), 2016-09-01), which sounds
appealing (and brings real benefits for the working directory),
but I have not thought about this protection for the git dir.

Maybe another approach is to not set the cwd for the child process
and instead point GIT_DIR_ENVIRONMENT only to the right
directory.

Then the use of GIT_DIR_ENVIRONMENT is obvious and
is not just for protection of corner cases.

However I think this protection is really valuable for the
.git dir as well as the submodule may be broken and we do not
want to end up in an infinite loop (as the discovery would find
the superproject which then tries to recurse, again, into the
submodule with the broken git dir)

When adding the comment here, we'd also want to have
the comment in prepare_submodule_repo_env, which
could be its own preparation commit.

> > +static struct repository *get_submodule_repo_for(struct repository *r,
> > +  const struct submodule *sub)
> > +{
> > + struct repository *ret = xmalloc(sizeof(*ret));
> > +
> > + if (repo_submodule_init(ret, r, sub)) {
> > + /*
> > +  * No entry in .gitmodules? Technically not a submodule,
> > +  * but historically we supported repositories that happen to 
> > be
> > +  * in-place where a gitlink is. Keep supporting them.
> > +  */
> > + struct strbuf gitdir = STRBUF_INIT;
> > + strbuf_repo_worktree_path(, r, "%s/.git", sub->path);
> > + if (repo_init(ret, gitdir.buf, NULL)) {
> > + strbuf_release();
> > + return NULL;
> > + }
> > + strbuf_release();
> > + }
> > +
> > + return ret;
> > +}
>
> This is the significant thing that this patch does more - an unskipped
> submodule is now something that either passes the checks in
> repo_submodule_init() or the checks in repo_init(), which seems to be
> stricter than the current check that ".git" points to a directory or is
> one. This means that we skip certain broken repositories, and this
> necessitates a change in the test.

I see. However there is no change in function, the check in repo_init
(or repo_submodule_init) is less strict than the check in the child process.
So if there are broken submodule repositories, the difference of this
patch is the layer at which it is caught, i.e. we would not spawn a child
that fails, but skip the submodule.

Thinking of that, maybe we need to announce that in get_next_submodule

>
> I think we should be more particular about what we're allowed to skip -
> in particular, maybe if we're planning to skip this submodule, its
> corresponding directory in the worktree (if one exists) needs to be
> empty.

If the working tree directory is empty for that submodule, it means
it is likely not initialized. But why would we use that as a signal to
skip the submodule?



> > - cp->dir = strbuf_detach(_path, NULL);
> > - prepare_submodule_repo_env(>env_array);
> > + prepare_submodule_repo_env_in_gitdir(>env_array);
> > + cp->dir = xstrdup(repo->gitdir);
>
> Here is where the functionality change (fetch in ".git") described in
> the commit message occurs.

True.

Thanks for the review, I'll try to split up this commit a bit more and
explain each part on its own.


Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-10-17 Thread Jonathan Tan
> This patch started as a refactoring to make 'get_next_submodule' more
> readable, but upon doing so, I realized that "git fetch" of the submodule
> actually doesn't need to be run in the submodules worktree. So let's run
> it in its git dir instead.

The commit message needs to be updated, I think - this patch does
significantly more than fetching in the gitdir.

> This patch leaks the cp->dir in get_next_submodule, as any further
> callback in run_processes_parallel doesn't have access to the child
> process any more.

The cp->dir is already leaked - probably better to write "cp->dir in
get_next_submodule() is still leaked, but this will be fixed in a
subsequent patch".

> +static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
> +{
> + prepare_submodule_repo_env_no_git_dir(out);
> + argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);

Why does GIT_DIR need to be set? Is it to avoid subcommands recursively
checking the parent directories in case the CWD is a malformed Git
repository? If yes, maybe it's worth adding a comment.

> +static struct repository *get_submodule_repo_for(struct repository *r,
> +  const struct submodule *sub)
> +{
> + struct repository *ret = xmalloc(sizeof(*ret));
> +
> + if (repo_submodule_init(ret, r, sub)) {
> + /*
> +  * No entry in .gitmodules? Technically not a submodule,
> +  * but historically we supported repositories that happen to be
> +  * in-place where a gitlink is. Keep supporting them.
> +  */
> + struct strbuf gitdir = STRBUF_INIT;
> + strbuf_repo_worktree_path(, r, "%s/.git", sub->path);
> + if (repo_init(ret, gitdir.buf, NULL)) {
> + strbuf_release();
> + return NULL;
> + }
> + strbuf_release();
> + }
> +
> + return ret;
> +}

This is the significant thing that this patch does more - an unskipped
submodule is now something that either passes the checks in
repo_submodule_init() or the checks in repo_init(), which seems to be
stricter than the current check that ".git" points to a directory or is
one. This means that we skip certain broken repositories, and this
necessitates a change in the test.

I think we should be more particular about what we're allowed to skip -
in particular, maybe if we're planning to skip this submodule, its
corresponding directory in the worktree (if one exists) needs to be
empty.

> - cp->dir = strbuf_detach(_path, NULL);
> - prepare_submodule_repo_env(>env_array);
> + prepare_submodule_repo_env_in_gitdir(>env_array);
> + cp->dir = xstrdup(repo->gitdir);

Here is where the functionality change (fetch in ".git") described in
the commit message occurs.


[PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-10-16 Thread Stefan Beller
This patch started as a refactoring to make 'get_next_submodule' more
readable, but upon doing so, I realized that "git fetch" of the submodule
actually doesn't need to be run in the submodules worktree. So let's run
it in its git dir instead.

That should pave the way towards fetching submodules that are currently
not checked out.

This patch leaks the cp->dir in get_next_submodule, as any further
callback in run_processes_parallel doesn't have access to the child
process any more. In an early iteration of this patch, the function
get_submodule_repo_for directly returned the string containing the
git directory, which would be a better design choice for this patch.

However the next patch both fixes the memory leak of cp->dir and also has
a use case for using the full repository handle of the submodule, so
it makes sense to introduce the get_submodule_repo_for here already.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 submodule.c | 51 +++--
 t/t5526-fetch-submodules.sh |  7 -
 2 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/submodule.c b/submodule.c
index cbefe5f54d..30c06507e3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -495,6 +495,12 @@ void prepare_submodule_repo_env(struct argv_array *out)
 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
+static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
+{
+   prepare_submodule_repo_env_no_git_dir(out);
+   argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1241,6 +1247,29 @@ static int get_fetch_recurse_config(const struct 
submodule *submodule,
return spf->default_option;
 }
 
+static struct repository *get_submodule_repo_for(struct repository *r,
+const struct submodule *sub)
+{
+   struct repository *ret = xmalloc(sizeof(*ret));
+
+   if (repo_submodule_init(ret, r, sub)) {
+   /*
+* No entry in .gitmodules? Technically not a submodule,
+* but historically we supported repositories that happen to be
+* in-place where a gitlink is. Keep supporting them.
+*/
+   struct strbuf gitdir = STRBUF_INIT;
+   strbuf_repo_worktree_path(, r, "%s/.git", sub->path);
+   if (repo_init(ret, gitdir.buf, NULL)) {
+   strbuf_release();
+   return NULL;
+   }
+   strbuf_release();
+   }
+
+   return ret;
+}
+
 static int get_next_submodule(struct child_process *cp,
  struct strbuf *err, void *data, void **task_cb)
 {
@@ -1248,12 +1277,11 @@ static int get_next_submodule(struct child_process *cp,
struct submodule_parallel_fetch *spf = data;
 
for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-   struct strbuf submodule_path = STRBUF_INIT;
-   struct strbuf submodule_git_dir = STRBUF_INIT;
struct strbuf submodule_prefix = STRBUF_INIT;
const struct cache_entry *ce = spf->r->index->cache[spf->count];
-   const char *git_dir, *default_argv;
+   const char *default_argv;
const struct submodule *submodule;
+   struct repository *repo;
struct submodule default_submodule = SUBMODULE_INIT;
 
if (!S_ISGITLINK(ce->ce_mode))
@@ -1288,16 +1316,12 @@ static int get_next_submodule(struct child_process *cp,
continue;
}
 
-   strbuf_repo_worktree_path(_path, spf->r, "%s", 
ce->name);
-   strbuf_addf(_git_dir, "%s/.git", submodule_path.buf);
strbuf_addf(_prefix, "%s%s/", spf->prefix, ce->name);
-   git_dir = read_gitfile(submodule_git_dir.buf);
-   if (!git_dir)
-   git_dir = submodule_git_dir.buf;
-   if (is_directory(git_dir)) {
+   repo = get_submodule_repo_for(spf->r, submodule);
+   if (repo) {
child_process_init(cp);
-   cp->dir = strbuf_detach(_path, NULL);
-   prepare_submodule_repo_env(>env_array);
+   prepare_submodule_repo_env_in_gitdir(>env_array);
+   cp->dir = xstrdup(repo->gitdir);
cp->git_cmd = 1;
if (!spf->quiet)
strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1307,10 +1331,11 @@ static int get_next_submodule(struct child_process *cp,
argv_array_push(>args, default_argv);
argv_array_push(>args, 

[PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-09-17 Thread Stefan Beller
This patch started as a refactoring to make 'get_next_submodule' more
readable, but upon doing so, I realized that "git fetch" of the submodule
actually doesn't need to be run in the submodules worktree. So let's run
it in its git dir instead.

That should pave the way towards fetching submodules that are currently
not checked out.

Signed-off-by: Stefan Beller 
---
 submodule.c | 45 +++--
 t/t5526-fetch-submodules.sh |  7 +-
 2 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/submodule.c b/submodule.c
index 00a9a3c6b12..88bce534268 100644
--- a/submodule.c
+++ b/submodule.c
@@ -481,6 +481,12 @@ void prepare_submodule_repo_env(struct argv_array *out)
 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
+static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
+{
+   prepare_submodule_repo_env_no_git_dir(out);
+   argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1227,6 +1233,27 @@ static int get_fetch_recurse_config(const struct 
submodule *submodule,
return spf->default_option;
 }
 
+static char *get_submodule_git_dir(struct repository *r, const char *path)
+{
+   struct repository subrepo;
+   const char *ret;
+
+   if (repo_submodule_init(, r, path)) {
+   /* no entry in .gitmodules? */
+   struct strbuf gitdir = STRBUF_INIT;
+   strbuf_repo_worktree_path(, r, "%s/.git", path);
+   if (repo_init(, gitdir.buf, NULL)) {
+   strbuf_release();
+   return NULL;
+   }
+   }
+
+   ret = xstrdup(subrepo.gitdir);
+   repo_clear();
+
+   return ret;
+}
+
 static int get_next_submodule(struct child_process *cp,
  struct strbuf *err, void *data, void **task_cb)
 {
@@ -1234,8 +1261,6 @@ static int get_next_submodule(struct child_process *cp,
struct submodule_parallel_fetch *spf = data;
 
for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-   struct strbuf submodule_path = STRBUF_INIT;
-   struct strbuf submodule_git_dir = STRBUF_INIT;
struct strbuf submodule_prefix = STRBUF_INIT;
const struct cache_entry *ce = spf->r->index->cache[spf->count];
const char *git_dir, *default_argv;
@@ -1274,16 +1299,12 @@ static int get_next_submodule(struct child_process *cp,
continue;
}
 
-   strbuf_repo_worktree_path(_path, spf->r, "%s", 
ce->name);
-   strbuf_addf(_git_dir, "%s/.git", submodule_path.buf);
strbuf_addf(_prefix, "%s%s/", spf->prefix, ce->name);
-   git_dir = read_gitfile(submodule_git_dir.buf);
-   if (!git_dir)
-   git_dir = submodule_git_dir.buf;
-   if (is_directory(git_dir)) {
+   git_dir = get_submodule_git_dir(spf->r, ce->name);
+   if (git_dir) {
child_process_init(cp);
-   cp->dir = strbuf_detach(_path, NULL);
-   prepare_submodule_repo_env(>env_array);
+   prepare_submodule_repo_env_in_gitdir(>env_array);
+   cp->dir = git_dir;
cp->git_cmd = 1;
if (!spf->quiet)
strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1293,10 +1314,10 @@ static int get_next_submodule(struct child_process *cp,
argv_array_push(>args, default_argv);
argv_array_push(>args, "--submodule-prefix");
argv_array_push(>args, submodule_prefix.buf);
+
+   free(git_dir);
ret = 1;
}
-   strbuf_release(_path);
-   strbuf_release(_git_dir);
strbuf_release(_prefix);
if (ret) {
spf->count++;
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 6c2f9b2ba26..42692219a1a 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -566,7 +566,12 @@ test_expect_success 'fetching submodule into a broken 
repository' '
 
test_must_fail git -C dst status &&
test_must_fail git -C dst diff &&
-   test_must_fail git -C dst fetch --recurse-submodules
+
+   # git-fetch cannot find the git directory of the submodule,
+   # so it will do nothing, successfully, as it cannot distinguish between
+   # this broken submodule and a submodule that was just set active but
+   # not cloned yet
+   git -C dst fetch --recurse-submodules
 '
 
 test_expect_success 

Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-09-13 Thread Stefan Beller
On Wed, Sep 12, 2018 at 11:36 AM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > This patch started as a refactoring to make 'get_next_submodule' more
> > readable, but upon doing so, I realized that git-fetch actually doesn't
> > need to be run in the worktree. So let's run it in the git dir instead.
>
> It may be clear to the author but not clear to the reader of the
> above paragraph that "worktree", "fetch" and "git dir" all refer to
> the recursively invoked operation that updates the submodules
> repository.  s/git-fetch/"git fetch" for the submodule/ should be
> sufficient to help the readers.
>
> > That should pave the way towards fetching submodules that are currently
> > not checked out.
>
> Very good.
>
> > +static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
> > +{
> > + prepare_submodule_repo_env_no_git_dir(out);
> > + argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
> > +}
> > +
> >  /* Helper function to display the submodule header line prior to the full
> >   * summary output. If it can locate the submodule objects directory it will
> >   * attempt to lookup both the left and right commits and put them into the
> > @@ -1227,6 +1233,27 @@ static int get_fetch_recurse_config(const struct 
> > submodule *submodule,
> >   return spf->default_option;
> >  }
> >
> > +static const char *get_submodule_git_dir(struct repository *r, const char 
> > *path)
> > +{
> > + struct repository subrepo;
> > + const char *ret;
> > +
> > + if (repo_submodule_init(, r, path)) {
> > + /* no entry in .gitmodules? */
> > + struct strbuf gitdir = STRBUF_INIT;
> > + strbuf_repo_worktree_path(, r, "%s/.git", path);
> > + if (repo_init(, gitdir.buf, NULL)) {
> > + strbuf_release();
> > + return NULL;
> > + }
>
> This is for the modern "absorbed" layout?  Do we get a notice and
> encouragement to migrate from the historical layout, or there is no
> need to (e.g. the migration happens automatically in some other
> codepaths)?

No, the absorbed would also be handled by repo_submodule_init.
I wrote a patch once to migrate repo_submodule_init to take a
"struct *submodule" instead of a path as the third argument, which
would fall in line with this patch as well, I'll dig it up.

Historically git-fetch supported repositories that are not submodules
(but have a gitlink and a working tree in place) as well. That is covered
here. (see comment /* no entry in .gitmodules? */)

> > - strbuf_release(_path);
> > - strbuf_release(_git_dir);
>
> But if it is a leak, it is easily plugged by freeing git_dir here, I
> think.

Thanks.


Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-09-12 Thread Junio C Hamano
Stefan Beller  writes:

> This patch started as a refactoring to make 'get_next_submodule' more
> readable, but upon doing so, I realized that git-fetch actually doesn't
> need to be run in the worktree. So let's run it in the git dir instead.

It may be clear to the author but not clear to the reader of the
above paragraph that "worktree", "fetch" and "git dir" all refer to
the recursively invoked operation that updates the submodules
repository.  s/git-fetch/"git fetch" for the submodule/ should be
sufficient to help the readers.

> That should pave the way towards fetching submodules that are currently
> not checked out.

Very good.

> +static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
> +{
> + prepare_submodule_repo_env_no_git_dir(out);
> + argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
> +}
> +
>  /* Helper function to display the submodule header line prior to the full
>   * summary output. If it can locate the submodule objects directory it will
>   * attempt to lookup both the left and right commits and put them into the
> @@ -1227,6 +1233,27 @@ static int get_fetch_recurse_config(const struct 
> submodule *submodule,
>   return spf->default_option;
>  }
>  
> +static const char *get_submodule_git_dir(struct repository *r, const char 
> *path)
> +{
> + struct repository subrepo;
> + const char *ret;
> +
> + if (repo_submodule_init(, r, path)) {
> + /* no entry in .gitmodules? */
> + struct strbuf gitdir = STRBUF_INIT;
> + strbuf_repo_worktree_path(, r, "%s/.git", path);
> + if (repo_init(, gitdir.buf, NULL)) {
> + strbuf_release();
> + return NULL;
> + }

This is for the modern "absorbed" layout?  Do we get a notice and
encouragement to migrate from the historical layout, or there is no
need to (e.g. the migration happens automatically in some other
codepaths)?

> + }
> +
> + ret = xstrdup(subrepo.gitdir);
> + repo_clear();
> +
> + return ret;
> +}

Returned value from this function is xstrdup()'ed so the caller
owns, not borrows.  There is no need to return "const char *" from
this function.  Also the caller needs to free it once done.

>  static int get_next_submodule(struct child_process *cp,
> struct strbuf *err, void *data, void **task_cb)
>  {
> @@ -1234,8 +1261,6 @@ static int get_next_submodule(struct child_process *cp,
>   struct submodule_parallel_fetch *spf = data;
>  
>   for (; spf->count < spf->r->index->cache_nr; spf->count++) {
> - struct strbuf submodule_path = STRBUF_INIT;
> - struct strbuf submodule_git_dir = STRBUF_INIT;
>   struct strbuf submodule_prefix = STRBUF_INIT;
>   const struct cache_entry *ce = spf->r->index->cache[spf->count];
>   const char *git_dir, *default_argv;
> @@ -1274,16 +1299,12 @@ static int get_next_submodule(struct child_process 
> *cp,
>   continue;
>   }
>  
> - strbuf_repo_worktree_path(_path, spf->r, "%s", 
> ce->name);
> - strbuf_addf(_git_dir, "%s/.git", submodule_path.buf);
>   strbuf_addf(_prefix, "%s%s/", spf->prefix, ce->name);
> - git_dir = read_gitfile(submodule_git_dir.buf);
> - if (!git_dir)
> - git_dir = submodule_git_dir.buf;
> - if (is_directory(git_dir)) {

In the old code, git_dir came from read_gitfile() which borrowed.

> + git_dir = get_submodule_git_dir(spf->r, ce->name);

In the new code, we own it, so we'd eventually need to get rid of
it.  How does it happen?

> + if (git_dir) {
>   child_process_init(cp);
> - cp->dir = strbuf_detach(_path, NULL);
> - prepare_submodule_repo_env(>env_array);
> + prepare_submodule_repo_env_in_gitdir(>env_array);
> + cp->dir = git_dir;

Does cp now own it and cp->dir gets freed once run_processes_parallel()
is done with this task?  Or is cp->dir simply leaking?  The old code
gave the result of strbuf_detach(), so even if cp->dir is leaking,
the leak is not new in this patch.

>   cp->git_cmd = 1;
>   if (!spf->quiet)
>   strbuf_addf(err, "Fetching submodule %s%s\n",
> @@ -1295,8 +1316,6 @@ static int get_next_submodule(struct child_process *cp,
>   argv_array_push(>args, submodule_prefix.buf);
>   ret = 1;
>   }
> - strbuf_release(_path);
> - strbuf_release(_git_dir);

But if it is a leak, it is easily plugged by freeing git_dir here, I
think.

Thanks.


>   strbuf_release(_prefix);
>   if (ret) {
>   spf->count++;
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index 

[PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-09-11 Thread Stefan Beller
This patch started as a refactoring to make 'get_next_submodule' more
readable, but upon doing so, I realized that git-fetch actually doesn't
need to be run in the worktree. So let's run it in the git dir instead.

That should pave the way towards fetching submodules that are currently
not checked out.

Signed-off-by: Stefan Beller 
---
 submodule.c | 43 ++---
 t/t5526-fetch-submodules.sh |  7 +-
 2 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/submodule.c b/submodule.c
index 00a9a3c6b12..1e6781504f0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -481,6 +481,12 @@ void prepare_submodule_repo_env(struct argv_array *out)
 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
+static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
+{
+   prepare_submodule_repo_env_no_git_dir(out);
+   argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1227,6 +1233,27 @@ static int get_fetch_recurse_config(const struct 
submodule *submodule,
return spf->default_option;
 }
 
+static const char *get_submodule_git_dir(struct repository *r, const char 
*path)
+{
+   struct repository subrepo;
+   const char *ret;
+
+   if (repo_submodule_init(, r, path)) {
+   /* no entry in .gitmodules? */
+   struct strbuf gitdir = STRBUF_INIT;
+   strbuf_repo_worktree_path(, r, "%s/.git", path);
+   if (repo_init(, gitdir.buf, NULL)) {
+   strbuf_release();
+   return NULL;
+   }
+   }
+
+   ret = xstrdup(subrepo.gitdir);
+   repo_clear();
+
+   return ret;
+}
+
 static int get_next_submodule(struct child_process *cp,
  struct strbuf *err, void *data, void **task_cb)
 {
@@ -1234,8 +1261,6 @@ static int get_next_submodule(struct child_process *cp,
struct submodule_parallel_fetch *spf = data;
 
for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-   struct strbuf submodule_path = STRBUF_INIT;
-   struct strbuf submodule_git_dir = STRBUF_INIT;
struct strbuf submodule_prefix = STRBUF_INIT;
const struct cache_entry *ce = spf->r->index->cache[spf->count];
const char *git_dir, *default_argv;
@@ -1274,16 +1299,12 @@ static int get_next_submodule(struct child_process *cp,
continue;
}
 
-   strbuf_repo_worktree_path(_path, spf->r, "%s", 
ce->name);
-   strbuf_addf(_git_dir, "%s/.git", submodule_path.buf);
strbuf_addf(_prefix, "%s%s/", spf->prefix, ce->name);
-   git_dir = read_gitfile(submodule_git_dir.buf);
-   if (!git_dir)
-   git_dir = submodule_git_dir.buf;
-   if (is_directory(git_dir)) {
+   git_dir = get_submodule_git_dir(spf->r, ce->name);
+   if (git_dir) {
child_process_init(cp);
-   cp->dir = strbuf_detach(_path, NULL);
-   prepare_submodule_repo_env(>env_array);
+   prepare_submodule_repo_env_in_gitdir(>env_array);
+   cp->dir = git_dir;
cp->git_cmd = 1;
if (!spf->quiet)
strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1295,8 +1316,6 @@ static int get_next_submodule(struct child_process *cp,
argv_array_push(>args, submodule_prefix.buf);
ret = 1;
}
-   strbuf_release(_path);
-   strbuf_release(_git_dir);
strbuf_release(_prefix);
if (ret) {
spf->count++;
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 6c2f9b2ba26..42692219a1a 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -566,7 +566,12 @@ test_expect_success 'fetching submodule into a broken 
repository' '
 
test_must_fail git -C dst status &&
test_must_fail git -C dst diff &&
-   test_must_fail git -C dst fetch --recurse-submodules
+
+   # git-fetch cannot find the git directory of the submodule,
+   # so it will do nothing, successfully, as it cannot distinguish between
+   # this broken submodule and a submodule that was just set active but
+   # not cloned yet
+   git -C dst fetch --recurse-submodules
 '
 
 test_expect_success "fetch new commits when submodule got renamed" '
-- 
2.19.0.397.gdd90340f6a-goog