Re: git submodules implementation question

2016-09-01 Thread Uma Srinivasan
>>> The final version needs to be accompanied with tests to show the
>>> effect of this change for callers.  A test would set up a top-level
>>> and submodule, deliberately break submodule/.git/ repository and
>>> show what breaks and how without this change.

Agreed!

The repo where the original problem surfaced is huge and in fact "git
status" appears to go into an infinite loop forking a zillion
processes on the system. If the dev system is small, it brings it to a
standstill. However, the good news is that I could build myself a
smaller reproducer by doing the following:

1) mkdir sm_test
2) cd sm_test
3) git clone git://git.mysociety.org/commonlib commonlib
4) git init
5) git submodule add ./commonlib/
6) cd commonlib/.git
7) rm -f all files

After this "git status" will fork several thousand processes but it
will ultimately fail as it runs into the path length max limit. I
still don't know why this doesn't happen with my original problem
repo. Perhaps I have to let it run overnight or perhaps it runs into
other system limitations before then

Anyway, with the fix "git status" fails quickly both in my reproducer
(and original repo) with the following message.
fatal: Not a git repository: '.git'
fatal: 'git status --porcelain' failed in submodule commonlib

Hope this helps.

Uma


On Thu, Sep 1, 2016 at 12:19 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>>> The final version needs to be accompanied with tests to show the
>>> effect of this change for callers.  A test would set up a top-level
>>> and submodule, deliberately break submodule/.git/ repository and
>>> show what breaks and how without this change.
>>
>> Tests are really good at providing this context as well, or to communicate
>> the actual underlying problem, which is not quite clear to me.
>> That is why I refrained from jumping into the discussion as I think the
>> first few emails were dropped from the mailing list and I am missing context.
>
> I do not know where you started reading, but the gist of it is that
> submodule.c spawns subprocess to run in the submodule's context by
> assuming that chdir'ing into the  of the submodule and running
> it (i.e. cp.dir set to  to drive start_command()) is
> sufficient.  When /.git (either it is a directory itself or it
> points at a directory in .git/module/ in the superproject) is
> a corrupt repository, running "git -C  command" would try to
> auto-detect the repository, because it thinks /.git is not a
> repository and it thinks it is not at the top-level of the working
> tree, and instead finds the repository of the top-level, which is
> almost never what we want.
>


Re: git submodules implementation question

2016-09-01 Thread Stefan Beller
> The final version needs to be accompanied with tests to show the
> effect of this change for callers.  A test would set up a top-level
> and submodule, deliberately break submodule/.git/ repository and
> show what breaks and how without this change.

Tests are really good at providing this context as well, or to communicate
the actual underlying problem, which is not quite clear to me.
That is why I refrained from jumping into the discussion as I think the
first few emails were dropped from the mailing list and I am missing context.

>
> So it is a bit more involved than just a single liner patch with one
> paragraph log message.  I may be able to find or make some time
> after I tag the 2.10 final this weekend to do so myself.

Thanks,
Stefan


Re: git submodules implementation question

2016-09-01 Thread Stefan Beller
On Thu, Sep 1, 2016 at 1:21 PM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Stefan Beller  writes:
>>
 The final version needs to be accompanied with tests to show the
 effect of this change for callers.  A test would set up a top-level
 and submodule, deliberately break submodule/.git/ repository and
 show what breaks and how without this change.
>>>
>>> Tests are really good at providing this context as well, or to communicate
>>> the actual underlying problem, which is not quite clear to me.
>>> That is why I refrained from jumping into the discussion as I think the
>>> first few emails were dropped from the mailing list and I am missing 
>>> context.
>>
>> I do not know where you started reading, but the gist of it is that
>> submodule.c spawns subprocess to run in the submodule's context by
>> assuming that chdir'ing into the  of the submodule and running
>> it (i.e. cp.dir set to  to drive start_command()) is
>> sufficient.  When /.git (either it is a directory itself or it
>> points at a directory in .git/module/ in the superproject) is
>> a corrupt repository, running "git -C  command" would try to
>> auto-detect the repository, because it thinks /.git is not a
>> repository and it thinks it is not at the top-level of the working
>> tree, and instead finds the repository of the top-level, which is
>> almost never what we want.
>
> This is with a test that covers the call in get_next_submodule() for
> the parallel fetch callback.  I think many of the codepaths will end
> up recursing forever the same way without the fix in a submodule
> repository that is broken in a similar way, but I didn't check, so
> I do not consider this to be completed.

Oh I see. That seems like a nasty bug.

>
> -- >8 --
> Subject: submodule: avoid auto-discovery in prepare_submodule_repo_env()
>
> The function is used to set up the environment variable used in a
> subprocess we spawn in a submodule directory.  The callers set up a
> child_process structure, find the working tree path of one submodule
> and set .dir field to it, and then use start_command() API to spawn
> the subprocess like "status", "fetch", etc.
>
> When this happens, we expect that the ".git" (either a directory or
> a gitfile that points at the real location) in the current working
> directory of the subprocess MUST be the repository for the submodule.
>
> If this ".git" thing is a corrupt repository, however, because
> prepare_submodule_repo_env() unsets GIT_DIR and GIT_WORK_TREE, the
> subprocess will see ".git", thinks it is not a repository, and
> attempt to find one by going up, likely to end up in finding the
> repository of the superproject.  In some codepaths, this will cause
> a command run with the "--recurse-submodules" option to recurse
> forever.
>
> By exporting GIT_DIR=.git, disable the auto-discovery logic in the
> subprocess, which would instead stop it and report an error.

and GIT_DIR=.git works for both .git files as well as the old fashioned way,
with the submodule repository at .git/, although that is not really documented.

>
> Not-signed-off-yet.
> ---
>  submodule.c |  1 +
>  t/t5526-fetch-submodules.sh | 29 +
>  2 files changed, 30 insertions(+)
>
> diff --git a/submodule.c b/submodule.c
> index 1b5cdfb..e8258f0 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1160,4 +1160,5 @@ void prepare_submodule_repo_env(struct argv_array *out)
> if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
> argv_array_push(out, *var);
> }
> +   argv_array_push(out, "GIT_DIR=.git");
>  }
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index 954d0e4..b2dee30 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -485,4 +485,33 @@ test_expect_success 'fetching submodules respects 
> parallel settings' '
> )
>  '
>
> +test_expect_success 'fetching submodule into a broken repository' '
> +   # Prepare src and src/sub nested in it
> +   git init src &&
> +   (
> +   cd src &&
> +   git init sub &&
> +   git -C sub commit --allow-empty -m "initial in sub" &&
> +   git submodule add -- ./sub sub &&
> +   git commit -m "initial in top"
> +   ) &&

This is not needed, as setup() set up some repositories for you.

> +
> +   # Clone the old-fashoned way
> +   git clone src dst &&

if you don't go with your own setup, maybe:

git clone . dst
git -C dst clone ../submodule submodule

# and further down s/sub/submodule/

> +   git -C dst clone ../src/sub sub &&
> +
> +   # Make sure that old-fashoned layout is still supported

fashioned

> +   git -C dst status &&
> +
> +   # Recursive-fetch works fine
> +   git -C dst fetch --recurse-submodules &&
> +
> +   # Break the receiving submodule
> +   

Re: git submodules implementation question

2016-09-01 Thread Junio C Hamano
Uma Srinivasan  writes:

> Yes, this one line fix addresses my problem.
>
> So, what is the next step? Will someone submit a patch or should I?
> Please note that I've never submitted a patch before, but I don't mind
> learning how to.

The final version needs to be accompanied with tests to show the
effect of this change for callers.  A test would set up a top-level
and submodule, deliberately break submodule/.git/ repository and
show what breaks and how without this change.

For example, there is a call in ok_to_remove_submodule() to run "git
status" in the submodule directory.  A test writer needs to find who
calls ok_to_remove_submodule() in what codepath (e.g. builtin/rm.c).
Then after figuring out under what condition the check is triggered,
write a test that runs "git rm" in a way that this change makes a
difference.  Hopefully, the test will fail without this change, and
will pass with this change.

So it is a bit more involved than just a single liner patch with one
paragraph log message.  I may be able to find or make some time
after I tag the 2.10 final this weekend to do so myself.

>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -1160,4 +1160,5 @@ void prepare_submodule_repo_env(struct argv_array *out)
>> if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
>> argv_array_push(out, *var);
>> }
>> +   argv_array_push(out, "GIT_DIR=.git");
>>  }
>>



Re: git submodules implementation question

2016-09-01 Thread Junio C Hamano
Stefan Beller  writes:

>> +test_expect_success 'fetching submodule into a broken repository' '
>> +   # Prepare src and src/sub nested in it
>> +   git init src &&
>> +   (
>> +   cd src &&
>> +   git init sub &&
>> +   git -C sub commit --allow-empty -m "initial in sub" &&
>> +   git submodule add -- ./sub sub &&
>> +   git commit -m "initial in top"
>> +   ) &&
>
> This is not needed, as setup() set up some repositories for you.

I didn't want any random cruft left behind in the top-level by
previous tests, so this is very much deliberate.


Re: git submodules implementation question

2016-09-01 Thread Junio C Hamano
Junio C Hamano  writes:

If we add

# "git diff" should terminate with an error.
# NOTE: without fix this will recurse forever!
test_must_fail git -C dst diff &&

after breaking the repository, we can also see "git diff" recurse
forever, because it wants to know if "sub" submodule is modified,
attempts to run "git status" in there, and ends up running that
command in the context of the parent repository.

I am tempted to cheat and commit this, even though this test is no
longer about fetching submodules.

-- >8 --
[PATCH] submodule: avoid auto-discovery in prepare_submodule_repo_env()

The function is used to set up the environment variable used in a
subprocess we spawn in a submodule directory.  The callers set up a
child_process structure, find the working tree path of one submodule
and set .dir field to it, and then use start_command() API to spawn
the subprocess like "status", "fetch", etc.

When this happens, we expect that the ".git" (either a directory or
a gitfile that points at the real location) in the current working
directory of the subprocess MUST be the repository for the submodule.

If this ".git" thing is a corrupt repository, however, because
prepare_submodule_repo_env() unsets GIT_DIR and GIT_WORK_TREE, the
subprocess will see ".git", thinks it is not a repository, and
attempt to find one by going up, likely to end up in finding the
repository of the superproject.  In some codepaths, this will cause
a command run with the "--recurse-submodules" option to recurse
forever.

By exporting GIT_DIR=.git, disable the auto-discovery logic in the
subprocess, which would instead stop it and report an error.

The test illustrates existing problems in a few callsites of this
function.  Without this fix, "git fetch --recurse-submodules", "git
status" and "git diff" keep recursing forever.

Signed-off-by: Junio C Hamano 
---
 submodule.c |  1 +
 t/t5526-fetch-submodules.sh | 35 +++
 2 files changed, 36 insertions(+)

diff --git a/submodule.c b/submodule.c
index 4532b11..2801fbb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1160,4 +1160,5 @@ void prepare_submodule_repo_env(struct argv_array *out)
if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
argv_array_push(out, *var);
}
+   argv_array_push(out, "GIT_DIR=.git");
 }
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 954d0e4..f3b0a8d 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -485,4 +485,39 @@ test_expect_success 'fetching submodules respects parallel 
settings' '
)
 '
 
+test_expect_success 'fetching submodule into a broken repository' '
+   # Prepare src and src/sub nested in it
+   git init src &&
+   (
+   cd src &&
+   git init sub &&
+   git -C sub commit --allow-empty -m "initial in sub" &&
+   git submodule add -- ./sub sub &&
+   git commit -m "initial in top"
+   ) &&
+
+   # Clone the old-fashoned way
+   git clone src dst &&
+   git -C dst clone ../src/sub sub &&
+
+   # Make sure that old-fashoned layout is still supported
+   git -C dst status &&
+
+   # "diff" would find no change
+   git -C dst diff --exit-code &&
+
+   # Recursive-fetch works fine
+   git -C dst fetch --recurse-submodules &&
+
+   # Break the receiving submodule
+   rm -f dst/sub/.git/HEAD &&
+
+   # NOTE: without the fix the following tests will recurse forever!
+   # They should terminate with an error.
+
+   test_must_fail git -C dst status &&
+   test_must_fail git -C dst diff &&
+   test_must_fail git -C dst fetch --recurse-submodules
+'
+
 test_done
-- 
2.10.0-rc2-314-g775ea9a





Re: git submodules implementation question

2016-09-01 Thread Junio C Hamano
Uma Srinivasan  writes:

> Anyway, with the fix "git status" fails quickly both in my reproducer
> (and original repo) with the following message.
> fatal: Not a git repository: '.git'
> fatal: 'git status --porcelain' failed in submodule commonlib

Thanks, that is exactly what I wanted to see as an outcome when I
did the one-liner patch.


Re: git submodules implementation question

2016-09-01 Thread Junio C Hamano
Junio C Hamano  writes:

> Stefan Beller  writes:
>
>>> The final version needs to be accompanied with tests to show the
>>> effect of this change for callers.  A test would set up a top-level
>>> and submodule, deliberately break submodule/.git/ repository and
>>> show what breaks and how without this change.
>>
>> Tests are really good at providing this context as well, or to communicate
>> the actual underlying problem, which is not quite clear to me.
>> That is why I refrained from jumping into the discussion as I think the
>> first few emails were dropped from the mailing list and I am missing context.
>
> I do not know where you started reading, but the gist of it is that
> submodule.c spawns subprocess to run in the submodule's context by
> assuming that chdir'ing into the  of the submodule and running
> it (i.e. cp.dir set to  to drive start_command()) is
> sufficient.  When /.git (either it is a directory itself or it
> points at a directory in .git/module/ in the superproject) is
> a corrupt repository, running "git -C  command" would try to
> auto-detect the repository, because it thinks /.git is not a
> repository and it thinks it is not at the top-level of the working
> tree, and instead finds the repository of the top-level, which is
> almost never what we want.

This is with a test that covers the call in get_next_submodule() for
the parallel fetch callback.  I think many of the codepaths will end
up recursing forever the same way without the fix in a submodule
repository that is broken in a similar way, but I didn't check, so
I do not consider this to be completed.

-- >8 --
Subject: submodule: avoid auto-discovery in prepare_submodule_repo_env()

The function is used to set up the environment variable used in a
subprocess we spawn in a submodule directory.  The callers set up a
child_process structure, find the working tree path of one submodule 
and set .dir field to it, and then use start_command() API to spawn
the subprocess like "status", "fetch", etc.

When this happens, we expect that the ".git" (either a directory or
a gitfile that points at the real location) in the current working
directory of the subprocess MUST be the repository for the submodule.

If this ".git" thing is a corrupt repository, however, because
prepare_submodule_repo_env() unsets GIT_DIR and GIT_WORK_TREE, the
subprocess will see ".git", thinks it is not a repository, and
attempt to find one by going up, likely to end up in finding the
repository of the superproject.  In some codepaths, this will cause
a command run with the "--recurse-submodules" option to recurse
forever.

By exporting GIT_DIR=.git, disable the auto-discovery logic in the
subprocess, which would instead stop it and report an error.

Not-signed-off-yet.
---
 submodule.c |  1 +
 t/t5526-fetch-submodules.sh | 29 +
 2 files changed, 30 insertions(+)

diff --git a/submodule.c b/submodule.c
index 1b5cdfb..e8258f0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1160,4 +1160,5 @@ void prepare_submodule_repo_env(struct argv_array *out)
if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
argv_array_push(out, *var);
}
+   argv_array_push(out, "GIT_DIR=.git");
 }
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 954d0e4..b2dee30 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -485,4 +485,33 @@ test_expect_success 'fetching submodules respects parallel 
settings' '
)
 '
 
+test_expect_success 'fetching submodule into a broken repository' '
+   # Prepare src and src/sub nested in it
+   git init src &&
+   (
+   cd src &&
+   git init sub &&
+   git -C sub commit --allow-empty -m "initial in sub" &&
+   git submodule add -- ./sub sub &&
+   git commit -m "initial in top"
+   ) &&
+
+   # Clone the old-fashoned way
+   git clone src dst &&
+   git -C dst clone ../src/sub sub &&
+
+   # Make sure that old-fashoned layout is still supported
+   git -C dst status &&
+
+   # Recursive-fetch works fine
+   git -C dst fetch --recurse-submodules &&
+
+   # Break the receiving submodule
+   rm -f dst/sub/.git/HEAD &&
+
+   # Recursive-fetch must terminate
+   # NOTE: without fix this will recurse forever!
+   test_must_fail git -C dst fetch --recurse-submodules
+'
+
 test_done




Re: git submodules implementation question

2016-09-01 Thread Junio C Hamano
Stefan Beller  writes:

>> The final version needs to be accompanied with tests to show the
>> effect of this change for callers.  A test would set up a top-level
>> and submodule, deliberately break submodule/.git/ repository and
>> show what breaks and how without this change.
>
> Tests are really good at providing this context as well, or to communicate
> the actual underlying problem, which is not quite clear to me.
> That is why I refrained from jumping into the discussion as I think the
> first few emails were dropped from the mailing list and I am missing context.

I do not know where you started reading, but the gist of it is that
submodule.c spawns subprocess to run in the submodule's context by
assuming that chdir'ing into the  of the submodule and running
it (i.e. cp.dir set to  to drive start_command()) is
sufficient.  When /.git (either it is a directory itself or it
points at a directory in .git/module/ in the superproject) is
a corrupt repository, running "git -C  command" would try to
auto-detect the repository, because it thinks /.git is not a
repository and it thinks it is not at the top-level of the working
tree, and instead finds the repository of the top-level, which is
almost never what we want.



Re: git submodules implementation question

2016-09-01 Thread Uma Srinivasan
Yes, this one line fix addresses my problem.

So, what is the next step? Will someone submit a patch or should I?
Please note that I've never submitted a patch before, but I don't mind
learning how to.

Thanks,
Uma

> --- a/submodule.c
> +++ b/submodule.c
> @@ -1160,4 +1160,5 @@ void prepare_submodule_repo_env(struct argv_array *out)
> if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
> argv_array_push(out, *var);
> }
> +   argv_array_push(out, "GIT_DIR=.git");
>  }
>


Re: git submodules implementation question

2016-08-31 Thread Junio C Hamano
Junio C Hamano  writes:

> I was wondering if we should unconditionally stuff GIT_DIR= repository location for the submodule> in the cp.env_array passed to
> the function prepare_submodule_repo_env().  As cp.dir will be set to
> the submodule's working tree, there is no need to set GIT_WORK_TREE
> and export it, I think, although it would not hurt.

After checking all callers of prepare_submodule_repo_env() and saw
that cp.dir is set to the top of working directory for the
submodule, I wonder if something as simple and stupid like the
attached patch is sufficient.  Our subprocess will go there, and
there should be a .git embedded directory or a .git file that points
into .git/modules/* in the superproject, and either way, it should
use .git directly underneath it.

 submodule.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/submodule.c b/submodule.c
index 1b5cdfb..e8258f0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1160,4 +1160,5 @@ void prepare_submodule_repo_env(struct argv_array *out)
if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
argv_array_push(out, *var);
}
+   argv_array_push(out, "GIT_DIR=.git");
 }



Re: git submodules implementation question

2016-08-31 Thread Uma Srinivasan
Here's one more attempt where I changed prepare_submodule_repo_env()
to take the submodule git dir as an argument.
Sorry for including this long code diff inline. If there's a better
way to have this discussion with code please let me know.

Thanks,
Uma

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 9d79f19..0741f6c 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -465,7 +465,7 @@ static int clone_submodule(const char *path, const
char *gitdir, const char *url
argv_array_push(, path);

cp.git_cmd = 1;
-   prepare_submodule_repo_env(_array);
+   prepare_submodule_repo_env(_array, gitdir);
cp.no_stdin = 1;

return run_command();
diff --git a/submodule.c b/submodule.c
index 5a62aa2..3d9174a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -522,11 +522,31 @@ static int has_remote(const char *refname, const
struct object_id *oid,
return 1;
 }

+static const char *submodule_git_dir = NULL;
+const char *get_submodule_git_dir(const char *path, struct strbuf *bufptr)
+{
+   strbuf_addf(bufptr, "%s/.git", path);
+   submodule_git_dir = read_gitfile(bufptr->buf);
+   if (!submodule_git_dir)
+   submodule_git_dir = bufptr->buf;
+   if (!is_directory(submodule_git_dir)) {
+   return NULL;
+   }
+   return submodule_git_dir;
+}
+
 static int submodule_needs_pushing(const char *path, const unsigned
char sha1[20])
 {
if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
return 0;

+   struct strbuf gitdirbuf = STRBUF_INIT;
+const char *sm_gitdir = get_submodule_git_dir(path, );
+   if (sm_gitdir == NULL) {
+   strbuf_release();
+   return 0;
+   }
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
const char *argv[] = {"rev-list", NULL, "--not",
"--remotes", "-n", "1" , NULL};
@@ -535,7 +555,7 @@ static int submodule_needs_pushing(const char
*path, const unsigned char sha1[20

argv[1] = sha1_to_hex(sha1);
cp.argv = argv;
-   prepare_submodule_repo_env(_array);
+   prepare_submodule_repo_env(_array, sm_gitdir);
cp.git_cmd = 1;
cp.no_stdin = 1;
cp.out = -1;
@@ -551,6 +571,7 @@ static int submodule_needs_pushing(const char
*path, const unsigned char sha1[20
return needs_pushing;
}

+   strbuf_release();
return 0;
 }

@@ -617,12 +638,18 @@ static int push_submodule(const char *path)
if (add_submodule_odb(path))
return 1;

+   struct strbuf gitdirbuf = STRBUF_INIT;
+const char *sm_gitdir = get_submodule_git_dir(path, );
+   if (sm_gitdir == NULL) {
+   strbuf_release();
+   return 0;
+   }
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
const char *argv[] = {"push", NULL};

cp.argv = argv;
-   prepare_submodule_repo_env(_array);
+   prepare_submodule_repo_env(_array, sm_gitdir);
cp.git_cmd = 1;
cp.no_stdin = 1;
cp.dir = path;
@@ -631,6 +658,7 @@ static int push_submodule(const char *path)
close(cp.out);
}

+   strbuf_release();
return 1;
 }

@@ -665,10 +693,16 @@ static int is_submodule_commit_present(const
char *path, unsigned char sha1[20])
struct child_process cp = CHILD_PROCESS_INIT;
const char *argv[] = {"rev-list", "-n", "1", NULL,
"--not", "--all", NULL};
struct strbuf buf = STRBUF_INIT;
+   struct strbuf gitdirbuf = STRBUF_INIT;
+   const char *sm_gitdir = get_submodule_git_dir(path, );
+   if (sm_gitdir == NULL) {
+   strbuf_release();
+   return 0;
+   }

argv[3] = sha1_to_hex(sha1);
cp.argv = argv;
-   prepare_submodule_repo_env(_array);
+   prepare_submodule_repo_env(_array, sm_gitdir);
cp.git_cmd = 1;
cp.no_stdin = 1;
cp.dir = path;
@@ -676,6 +710,7 @@ static int is_submodule_commit_present(const char
*path, unsigned char sha1[20])
is_present = 1;

strbuf_release();
+   strbuf_release();
}
return is_present;
 }
@@ -851,7 +886,7 @@ static int get_next_submodule(struct child_process *cp,
if (is_directory(git_dir)) {
child_process_init(cp);
cp->dir = strbuf_detach(_path, NULL);
-   prepare_submodule_repo_env(>env_array);
+   

Re: git submodules implementation question

2016-08-31 Thread Uma Srinivasan
> We want to affect only the process we are going to spawn to work
> inside the submodule, not ourselves, which is what this call does;
> this does not sound like a good idea.

Okay, in that case I would have to pass the "git_dir" as a new
argument to prepare_submodule_repo_env(). I know what to pass from the
is_submodule_modified() caller. I don't think it's all that obvious
for the other callers.

Thanks,
Uma

On Wed, Aug 31, 2016 at 11:44 AM, Junio C Hamano  wrote:
> Uma Srinivasan  writes:
>
>> diff --git a/submodule.c b/submodule.c
>> index 5a62aa2..23443a7 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -960,6 +960,9 @@ unsigned is_submodule_modified(const char *path,
>> int ignore_untracked)
>> return 0;
>>
>> }
>> +   /* stuff submodule git dir into env var */
>> +   set_git_dir(git_dir);
>
> We want to affect only the process we are going to spawn to work
> inside the submodule, not ourselves, which is what this call does;
> this does not sound like a good idea.
>


Re: git submodules implementation question

2016-08-31 Thread Junio C Hamano
Uma Srinivasan  writes:

> diff --git a/submodule.c b/submodule.c
> index 5a62aa2..23443a7 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -960,6 +960,9 @@ unsigned is_submodule_modified(const char *path,
> int ignore_untracked)
> return 0;
>
> }
> +   /* stuff submodule git dir into env var */
> +   set_git_dir(git_dir);

We want to affect only the process we are going to spawn to work
inside the submodule, not ourselves, which is what this call does;
this does not sound like a good idea.



Re: git submodules implementation question

2016-08-31 Thread Uma Srinivasan
FWIW, I tried the following quick change based on the info. provided
by Junio and it seems to "fix" my original problem. I'll let you
experts figure out if we need a more complete fix or not.

diff --git a/submodule.c b/submodule.c
index 5a62aa2..23443a7 100644
--- a/submodule.c
+++ b/submodule.c
@@ -960,6 +960,9 @@ unsigned is_submodule_modified(const char *path,
int ignore_untracked)
return 0;

}
+   /* stuff submodule git dir into env var */
+   set_git_dir(git_dir);
+
strbuf_reset();

if (ignore_untracked)
@@ -1279,5 +1282,9 @@ void prepare_submodule_repo_env(struct argv_array *out)
for (var = local_repo_env; *var; var++) {
if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
argv_array_push(out, *var);
+   if (strcmp(*var, GIT_DIR_ENVIRONMENT))
+   argv_array_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
+get_git_dir());
}
+
 }

Thanks,
Uma

On Wed, Aug 31, 2016 at 9:42 AM, Junio C Hamano  wrote:
> Uma Srinivasan  writes:
>
>>> I might suggest to update prepare_submodule_repo_env() so that the
>>> spawned process will *NOT* have to guess where the working tree and
>>> repository by exporting GIT_DIR (set to "git_dir" we discover above)
>>> and GIT_WORK_TREE (set to "." as cp.dir is set to the path to the
>>> working tree of the submodule).  That would stop the "git status" to
>>> guess (and fooled by a corrupted dir/.git that is not a git
>>> repository).
>>
>> Here's where I am struggling with my lack of knowledge of git
>> internals and the implementation particularly in the context of how
>> environment variables are passed from the parent to the child process.
>
> Ah, I was primarily addressing Jacob in the latter part of my
> message, as he's looked at similar things in his recent topic.
>
>> Are you suggesting that we set up the child process environment array
>> (the "out" argument) in prepare_submodule_repo_env() to include
>> GIT_DIR_ENVIRONMENT and GIT_WORK_TREE_ENVIRONMENT in addition to
>> CONFIG_DATA_ENVIRONMENT that is there now?
>
> I was wondering if we should unconditionally stuff GIT_DIR= repository location for the submodule> in the cp.env_array passed to
> the function prepare_submodule_repo_env().  As cp.dir will be set to
> the submodule's working tree, there is no need to set GIT_WORK_TREE
> and export it, I think, although it would not hurt.
>
> After all, we _are_ going into a separate and different project's
> repository (that is what a submodule is), and we _know_ where its
> repository data (i.e. GIT_DIR) and the location of its working tree
> (i.e. GIT_WORK_TREE).  There is no reason for the process that will
> work in the submodule to go through the usual "do we have .git in
> our $cwd that is a repository?  if not how about the parent directory
> of $cwd?  go up until we find one and that directory is the top of
> the working tree" discovery.
>
> More importantly, this matters when your GIT_DIR for the submodule
> is somehow corrupt.  The discovery process would say "there is .git
> in $cwd but that is not a repository" and continue upwards, which
> likely would find the repository for the top-level superproject,
> which definitely is _not_ want you want to happen.  And one way to
> avoid it is to tell the setup.c code that it should not do the
> discovery by being explicit.
>


Re: git submodules implementation question

2016-08-31 Thread Junio C Hamano
Uma Srinivasan  writes:

>> I might suggest to update prepare_submodule_repo_env() so that the
>> spawned process will *NOT* have to guess where the working tree and
>> repository by exporting GIT_DIR (set to "git_dir" we discover above)
>> and GIT_WORK_TREE (set to "." as cp.dir is set to the path to the
>> working tree of the submodule).  That would stop the "git status" to
>> guess (and fooled by a corrupted dir/.git that is not a git
>> repository).
>
> Here's where I am struggling with my lack of knowledge of git
> internals and the implementation particularly in the context of how
> environment variables are passed from the parent to the child process.

Ah, I was primarily addressing Jacob in the latter part of my
message, as he's looked at similar things in his recent topic.

> Are you suggesting that we set up the child process environment array
> (the "out" argument) in prepare_submodule_repo_env() to include
> GIT_DIR_ENVIRONMENT and GIT_WORK_TREE_ENVIRONMENT in addition to
> CONFIG_DATA_ENVIRONMENT that is there now?

I was wondering if we should unconditionally stuff GIT_DIR= in the cp.env_array passed to
the function prepare_submodule_repo_env().  As cp.dir will be set to
the submodule's working tree, there is no need to set GIT_WORK_TREE
and export it, I think, although it would not hurt.

After all, we _are_ going into a separate and different project's
repository (that is what a submodule is), and we _know_ where its
repository data (i.e. GIT_DIR) and the location of its working tree
(i.e. GIT_WORK_TREE).  There is no reason for the process that will
work in the submodule to go through the usual "do we have .git in
our $cwd that is a repository?  if not how about the parent directory
of $cwd?  go up until we find one and that directory is the top of
the working tree" discovery.

More importantly, this matters when your GIT_DIR for the submodule
is somehow corrupt.  The discovery process would say "there is .git
in $cwd but that is not a repository" and continue upwards, which
likely would find the repository for the top-level superproject,
which definitely is _not_ want you want to happen.  And one way to
avoid it is to tell the setup.c code that it should not do the
discovery by being explicit.



Re: git submodules implementation question

2016-08-30 Thread Uma Srinivasan
On Tue, Aug 30, 2016 at 10:53 AM, Junio C Hamano  wrote:
> Uma Srinivasan  writes:
>
>> I think the following fix is still needed to is_submodule_modified():
>>
>> strbuf_addf(, "%s/.git", path);
>> git_dir = read_gitfile(buf.buf);
>> if (!git_dir) {
>> git_dir = buf.buf;
>>  ==>   if (!is_git_directory(git_dir)) {
>>  ==> die("Corrupted .git dir in submodule %s", path);
>>  ==>   }
>> }
>
> If it is so important that git_dir is a valid Git
> repository after
>
> git_dir = read_gitfile(buf.buf);
> if (!git_dir)
> git_dir = buf.buf;
>
> is done to determine what "git_dir" to use, it seems to me that it
> does not make much sense to check ONLY dir/.git that is a directory
> and leave .git/modules/$name that dir/.git file points at unchecked.
>

Okay.

> But there is much bigger problem with the above addition, I think.
> There also can be a case where dir/ does not even have ".git" in it.
> A submodule the user is not interested in will just have an empty
> directory there, and immediately after the above three lines I
> reproduced above, we have this
>
> if (!is_directory(git_dir)) {
> strbuf_release();
> return 0;
> }
>

Good to know about this use case.

> The added check will break the use case.  If anything, that check,
> if this code needs to verify that "git_dir" points at a valid Git
> repository, should come _after_ that.

Okay.

>
> Shouldn't "git-status --porcelain" run in the submodule notice that
> it is not a valid repository and quickly die anyway?  Why should we
> even check before spawning that process in the first place?

I don't see any reason to disagree with this.

>
> I might suggest to update prepare_submodule_repo_env() so that the
> spawned process will *NOT* have to guess where the working tree and
> repository by exporting GIT_DIR (set to "git_dir" we discover above)
> and GIT_WORK_TREE (set to "." as cp.dir is set to the path to the
> working tree of the submodule).  That would stop the "git status" to
> guess (and fooled by a corrupted dir/.git that is not a git
> repository).
>

Here's where I am struggling with my lack of knowledge of git
internals and the implementation particularly in the context of how
environment variables are passed from the parent to the child process.

Are you suggesting that we set up the child process environment array
(the "out" argument) in prepare_submodule_repo_env() to include
GIT_DIR_ENVIRONMENT and GIT_WORK_TREE_ENVIRONMENT in addition to
CONFIG_DATA_ENVIRONMENT that is there now?  Can I use the strcmp and
argv_array_push() calls to do this? There are several callers for this
routine prepare_submodule_repo_env(). Would the caller pass the git
dir and work tree as arguments to this routine or would the caller set
up the environment variables as needed? Is there any documentation or
other example code where similar passing of env. vars to a child
process is being done?

Sorry for all these questions but I'm still a novice as far as the
code is concerned.

Thanks for your help.

Uma


Re: git submodules implementation question

2016-08-30 Thread Junio C Hamano
Uma Srinivasan  writes:

> I think the following fix is still needed to is_submodule_modified():
>
> strbuf_addf(, "%s/.git", path);
> git_dir = read_gitfile(buf.buf);
> if (!git_dir) {
> git_dir = buf.buf;
>  ==>   if (!is_git_directory(git_dir)) {
>  ==> die("Corrupted .git dir in submodule %s", path);
>  ==>   }
> }

If it is so important that git_dir is a valid Git
repository after

git_dir = read_gitfile(buf.buf);
if (!git_dir)
git_dir = buf.buf;

is done to determine what "git_dir" to use, it seems to me that it
does not make much sense to check ONLY dir/.git that is a directory
and leave .git/modules/$name that dir/.git file points at unchecked.

But there is much bigger problem with the above addition, I think.
There also can be a case where dir/ does not even have ".git" in it.
A submodule the user is not interested in will just have an empty
directory there, and immediately after the above three lines I
reproduced above, we have this

if (!is_directory(git_dir)) {
strbuf_release();
return 0;
}

The added check will break the use case.  If anything, that check,
if this code needs to verify that "git_dir" points at a valid Git
repository, should come _after_ that.

Shouldn't "git-status --porcelain" run in the submodule notice that
it is not a valid repository and quickly die anyway?  Why should we
even check before spawning that process in the first place?

I might suggest to update prepare_submodule_repo_env() so that the
spawned process will *NOT* have to guess where the working tree and
repository by exporting GIT_DIR (set to "git_dir" we discover above)
and GIT_WORK_TREE (set to "." as cp.dir is set to the path to the
working tree of the submodule).  That would stop the "git status" to
guess (and fooled by a corrupted dir/.git that is not a git
repository).



Re: git submodules implementation question

2016-08-30 Thread Jacob Keller
On Mon, Aug 29, 2016 at 11:09 PM, Jacob Keller  wrote:
> On Mon, Aug 29, 2016 at 5:12 PM, Uma Srinivasan  
> wrote:
>> This is great! Thanks Jake. If you happen to have the patch ID it
>> would be helpful.
>>
>> Uma
>>
>
> http://public-inbox.org/git/1472236108.28343.5.ca...@intel.com/


Actually correct patch is
http://public-inbox.org/git/20160825233243.30700-6-jacob.e.kel...@intel.com/

Thanks,
Jake


Re: git submodules implementation question

2016-08-30 Thread Jacob Keller
On Mon, Aug 29, 2016 at 5:12 PM, Uma Srinivasan  wrote:
> This is great! Thanks Jake. If you happen to have the patch ID it
> would be helpful.
>
> Uma
>

http://public-inbox.org/git/1472236108.28343.5.ca...@intel.com/


Re: git submodules implementation question

2016-08-29 Thread Uma Srinivasan
This is great! Thanks Jake. If you happen to have the patch ID it
would be helpful.

Uma

On Mon, Aug 29, 2016 at 5:02 PM, Jacob Keller  wrote:
> On Mon, Aug 29, 2016 at 4:15 PM, Junio C Hamano  wrote:
>> Uma Srinivasan  writes:
>>> This fixes my issue but what do you think? Is this the right way to
>>> fix it? Is there a better way?
>>
>> I think we already have a helper function that does a lot better
>> than "does it have a file called HEAD" to ask "is this a git
>> directory?" and its name I think is "is_git_directory" (I know, we
>> are not imaginative when naming our functions).
>>
>> As to the check makes sense in the context of this function, I am
>> not an expert to judge.  I'd expect Jens, Heiko and/or Stefan to
>> know better than I do.
>
> One of my patches adds a "is_git_directory()" call to this, and if we
> fail falls back to checking the .gitmodules and git-config for
> information regarding the submodule should it no longer be checked
> out. I suspect this patch will address your concern.
>
> Thanks,
> Jake


Re: git submodules implementation question

2016-08-29 Thread Jacob Keller
On Mon, Aug 29, 2016 at 4:15 PM, Junio C Hamano  wrote:
> Uma Srinivasan  writes:
>> This fixes my issue but what do you think? Is this the right way to
>> fix it? Is there a better way?
>
> I think we already have a helper function that does a lot better
> than "does it have a file called HEAD" to ask "is this a git
> directory?" and its name I think is "is_git_directory" (I know, we
> are not imaginative when naming our functions).
>
> As to the check makes sense in the context of this function, I am
> not an expert to judge.  I'd expect Jens, Heiko and/or Stefan to
> know better than I do.

One of my patches adds a "is_git_directory()" call to this, and if we
fail falls back to checking the .gitmodules and git-config for
information regarding the submodule should it no longer be checked
out. I suspect this patch will address your concern.

Thanks,
Jake


Re: git submodules implementation question

2016-08-29 Thread Uma Srinivasan
Yes, is_git_directory() is much better. Thanks for the pointer.

I will submit a patch unless I hear more suggestions from others.

Uma



On Mon, Aug 29, 2016 at 4:15 PM, Junio C Hamano  wrote:
> Uma Srinivasan  writes:
>
>> On Mon, Aug 29, 2016 at 2:13 PM, Uma Srinivasan  
>> wrote:
>>> Ok that makes sense. Thanks much.
>>>
>>> Uma
>>>
>> With respect to my original problem with a corrupted .git directory
>> under the submodule directory, I am thinking of adding the following 4
>> lines marked with ### to is_submodule_modified() to detect the
>> corrupted dir and die quickly instead of forking several child
>> processes:
>>
>>strbuf_addf(, "%s/.git", path);
>>git_dir = read_gitfile(buf.buf);
>>if (!git_dir) {
>>  ### strbuf_addf(_ref, "%s/HEAD",buf.buf);
>>  ### if (strbuf_read_file(_ref, head_ref.buf,0) < 0) {
>>   ### die("Corrupted .git dir in submodule %s", 
>> path);
>>  ###}
>>  git_dir = buf.buf;
>>   }
>>
>> This fixes my issue but what do you think? Is this the right way to
>> fix it? Is there a better way?
>
> I think we already have a helper function that does a lot better
> than "does it have a file called HEAD" to ask "is this a git
> directory?" and its name I think is "is_git_directory" (I know, we
> are not imaginative when naming our functions).
>
> As to the check makes sense in the context of this function, I am
> not an expert to judge.  I'd expect Jens, Heiko and/or Stefan to
> know better than I do.


Re: git submodules implementation question

2016-08-29 Thread Junio C Hamano
Uma Srinivasan  writes:

> On Mon, Aug 29, 2016 at 2:13 PM, Uma Srinivasan  
> wrote:
>> Ok that makes sense. Thanks much.
>>
>> Uma
>>
> With respect to my original problem with a corrupted .git directory
> under the submodule directory, I am thinking of adding the following 4
> lines marked with ### to is_submodule_modified() to detect the
> corrupted dir and die quickly instead of forking several child
> processes:
>
>strbuf_addf(, "%s/.git", path);
>git_dir = read_gitfile(buf.buf);
>if (!git_dir) {
>  ### strbuf_addf(_ref, "%s/HEAD",buf.buf);
>  ### if (strbuf_read_file(_ref, head_ref.buf,0) < 0) {
>   ### die("Corrupted .git dir in submodule %s", path);
>  ###}
>  git_dir = buf.buf;
>   }
>
> This fixes my issue but what do you think? Is this the right way to
> fix it? Is there a better way?

I think we already have a helper function that does a lot better
than "does it have a file called HEAD" to ask "is this a git
directory?" and its name I think is "is_git_directory" (I know, we
are not imaginative when naming our functions).

As to the check makes sense in the context of this function, I am
not an expert to judge.  I'd expect Jens, Heiko and/or Stefan to
know better than I do.


Re: git submodules implementation question

2016-08-29 Thread Uma Srinivasan
With respect to my original problem with a corrupted .git directory
under the submodule directory, I am thinking of adding the following 4
lines marked with ### to is_submodule_modified() to detect the
corrupted dir and die quickly instead of forking several child
processes:

   strbuf_addf(, "%s/.git", path);
   git_dir = read_gitfile(buf.buf);
   if (!git_dir) {
 ### strbuf_addf(_ref, "%s/HEAD",buf.buf);
 ### if (strbuf_read_file(_ref, head_ref.buf,0) < 0) {
  ### die("Corrupted .git dir in submodule %s", path);
 ###}
 git_dir = buf.buf;
  }

This fixes my issue but what do you think? Is this the right way to
fix it? Is there a better way?

Thanks,
Uma

On Mon, Aug 29, 2016 at 2:13 PM, Uma Srinivasan  wrote:
> Ok that makes sense. Thanks much.
>
> Uma
>
> On Mon, Aug 29, 2016 at 2:09 PM, Junio C Hamano  wrote:
>> On Mon, Aug 29, 2016 at 2:03 PM, Uma Srinivasan  
>> wrote:
>>> On Mon, Aug 29, 2016 at 1:03 PM, Junio C Hamano  wrote:

 A top-level superproject can have a submodule bound at its "dir/"
 directory, and "dir/.git" can either be a gitfile which you can read
 with read_gitfile() and point into somewhere in ".git/modules/" of
 the top-level superproject.  "dir/.git" can _ALSO_ be a fully valid
 Git directory.  So at the top of a superproject, you could do

 git clone $URL ./dir2
 git add dir2

 to clone an independent project into dir2 directory, and add it as a
 new submodule.  The fallback is to support such a layout.

>>> Thanks for the reply. However, in this case
>>>
>>>   git clone $URL ./dir2
>>>   git add dir2
>>>
>>> how will "dir2" get ever get registered as a submodule?
>>
>> With a separate invocation of "git config -f .gitmodules", of course.
>> The layout to use gitfile to point into .git/modules/ is a more recent
>> invention than the submodule support itself that "git add" knows about.
>> The code needs to support both layout as well as it can, and that
>> is what the "can we read it as gitfile?  If not, that directory itself may
>> be a git repository" codepath you asked about is doing.


Re: git submodules implementation question

2016-08-29 Thread Uma Srinivasan
Ok that makes sense. Thanks much.

Uma

On Mon, Aug 29, 2016 at 2:09 PM, Junio C Hamano  wrote:
> On Mon, Aug 29, 2016 at 2:03 PM, Uma Srinivasan  
> wrote:
>> On Mon, Aug 29, 2016 at 1:03 PM, Junio C Hamano  wrote:
>>>
>>> A top-level superproject can have a submodule bound at its "dir/"
>>> directory, and "dir/.git" can either be a gitfile which you can read
>>> with read_gitfile() and point into somewhere in ".git/modules/" of
>>> the top-level superproject.  "dir/.git" can _ALSO_ be a fully valid
>>> Git directory.  So at the top of a superproject, you could do
>>>
>>> git clone $URL ./dir2
>>> git add dir2
>>>
>>> to clone an independent project into dir2 directory, and add it as a
>>> new submodule.  The fallback is to support such a layout.
>>>
>> Thanks for the reply. However, in this case
>>
>>   git clone $URL ./dir2
>>   git add dir2
>>
>> how will "dir2" get ever get registered as a submodule?
>
> With a separate invocation of "git config -f .gitmodules", of course.
> The layout to use gitfile to point into .git/modules/ is a more recent
> invention than the submodule support itself that "git add" knows about.
> The code needs to support both layout as well as it can, and that
> is what the "can we read it as gitfile?  If not, that directory itself may
> be a git repository" codepath you asked about is doing.


Re: git submodules implementation question

2016-08-29 Thread Uma Srinivasan
Thanks for the reply. However, in this case

  git clone $URL ./dir2
  git add dir2

how will "dir2" get ever get registered as a submodule? I don't see
how one can reach the "is_submodule_modified" routine for the scenario
above.

My understanding is that a sub-directory can be registered as a
submodule only with the "git submodule add " command. In this case
only a gitlink file is created within the sub-directory and not a .git
subdirectory. Please correct me if I am wrong.

Thanks again,
Uma

On Mon, Aug 29, 2016 at 1:03 PM, Junio C Hamano  wrote:
> Uma Srinivasan  writes:
>
>> git_dir = read_gitfile(buf.buf);
>> if (!git_dir)
>>
>> git_dir = buf.buf;
>>
>> Can anyone explain to me why we are replacing a failed reading of a
>> git file with the original sub directory name?
>
> A top-level superproject can have a submodule bound at its "dir/"
> directory, and "dir/.git" can either be a gitfile which you can read
> with read_gitfile() and point into somewhere in ".git/modules/" of
> the top-level superproject.  "dir/.git" can _ALSO_ be a fully valid
> Git directory.  So at the top of a superproject, you could do
>
> git clone $URL ./dir2
> git add dir2
>
> to clone an independent project into dir2 directory, and add it as a
> new submodule.  The fallback is to support such a layout.
>


Re: git submodules implementation question

2016-08-29 Thread Junio C Hamano
On Mon, Aug 29, 2016 at 2:03 PM, Uma Srinivasan  wrote:
> On Mon, Aug 29, 2016 at 1:03 PM, Junio C Hamano  wrote:
>>
>> A top-level superproject can have a submodule bound at its "dir/"
>> directory, and "dir/.git" can either be a gitfile which you can read
>> with read_gitfile() and point into somewhere in ".git/modules/" of
>> the top-level superproject.  "dir/.git" can _ALSO_ be a fully valid
>> Git directory.  So at the top of a superproject, you could do
>>
>> git clone $URL ./dir2
>> git add dir2
>>
>> to clone an independent project into dir2 directory, and add it as a
>> new submodule.  The fallback is to support such a layout.
>>
> Thanks for the reply. However, in this case
>
>   git clone $URL ./dir2
>   git add dir2
>
> how will "dir2" get ever get registered as a submodule?

With a separate invocation of "git config -f .gitmodules", of course.
The layout to use gitfile to point into .git/modules/ is a more recent
invention than the submodule support itself that "git add" knows about.
The code needs to support both layout as well as it can, and that
is what the "can we read it as gitfile?  If not, that directory itself may
be a git repository" codepath you asked about is doing.


Re: git submodules implementation question

2016-08-29 Thread Junio C Hamano
Uma Srinivasan  writes:

> git_dir = read_gitfile(buf.buf);
> if (!git_dir)
>
> git_dir = buf.buf;
>
> Can anyone explain to me why we are replacing a failed reading of a
> git file with the original sub directory name?

A top-level superproject can have a submodule bound at its "dir/"
directory, and "dir/.git" can either be a gitfile which you can read
with read_gitfile() and point into somewhere in ".git/modules/" of
the top-level superproject.  "dir/.git" can _ALSO_ be a fully valid
Git directory.  So at the top of a superproject, you could do

git clone $URL ./dir2
git add dir2

to clone an independent project into dir2 directory, and add it as a
new submodule.  The fallback is to support such a layout.