Re: git submodules implementation question
>>> The final version needs to be accompanied with tests to show the >>> effect of this change for callers. A test would set up a top-level >>> and submodule, deliberately break submodule/.git/ repository and >>> show what breaks and how without this change. Agreed! The repo where the original problem surfaced is huge and in fact "git status" appears to go into an infinite loop forking a zillion processes on the system. If the dev system is small, it brings it to a standstill. However, the good news is that I could build myself a smaller reproducer by doing the following: 1) mkdir sm_test 2) cd sm_test 3) git clone git://git.mysociety.org/commonlib commonlib 4) git init 5) git submodule add ./commonlib/ 6) cd commonlib/.git 7) rm -f all files After this "git status" will fork several thousand processes but it will ultimately fail as it runs into the path length max limit. I still don't know why this doesn't happen with my original problem repo. Perhaps I have to let it run overnight or perhaps it runs into other system limitations before then Anyway, with the fix "git status" fails quickly both in my reproducer (and original repo) with the following message. fatal: Not a git repository: '.git' fatal: 'git status --porcelain' failed in submodule commonlib Hope this helps. Uma On Thu, Sep 1, 2016 at 12:19 PM, Junio C Hamano wrote: > Stefan Beller writes: > >>> The final version needs to be accompanied with tests to show the >>> effect of this change for callers. A test would set up a top-level >>> and submodule, deliberately break submodule/.git/ repository and >>> show what breaks and how without this change. >> >> Tests are really good at providing this context as well, or to communicate >> the actual underlying problem, which is not quite clear to me. >> That is why I refrained from jumping into the discussion as I think the >> first few emails were dropped from the mailing list and I am missing context. > > I do not know where you started reading, but the gist of it is that > submodule.c spawns subprocess to run in the submodule's context by > assuming that chdir'ing into the of the submodule and running > it (i.e. cp.dir set to to drive start_command(&cp)) is > sufficient. When /.git (either it is a directory itself or it > points at a directory in .git/module/ in the superproject) is > a corrupt repository, running "git -C command" would try to > auto-detect the repository, because it thinks /.git is not a > repository and it thinks it is not at the top-level of the working > tree, and instead finds the repository of the top-level, which is > almost never what we want. >
Re: git submodules implementation question
> The final version needs to be accompanied with tests to show the > effect of this change for callers. A test would set up a top-level > and submodule, deliberately break submodule/.git/ repository and > show what breaks and how without this change. Tests are really good at providing this context as well, or to communicate the actual underlying problem, which is not quite clear to me. That is why I refrained from jumping into the discussion as I think the first few emails were dropped from the mailing list and I am missing context. > > So it is a bit more involved than just a single liner patch with one > paragraph log message. I may be able to find or make some time > after I tag the 2.10 final this weekend to do so myself. Thanks, Stefan
Re: git submodules implementation question
On Thu, Sep 1, 2016 at 1:21 PM, Junio C Hamano wrote: > Junio C Hamano writes: > >> Stefan Beller writes: >> The final version needs to be accompanied with tests to show the effect of this change for callers. A test would set up a top-level and submodule, deliberately break submodule/.git/ repository and show what breaks and how without this change. >>> >>> Tests are really good at providing this context as well, or to communicate >>> the actual underlying problem, which is not quite clear to me. >>> That is why I refrained from jumping into the discussion as I think the >>> first few emails were dropped from the mailing list and I am missing >>> context. >> >> I do not know where you started reading, but the gist of it is that >> submodule.c spawns subprocess to run in the submodule's context by >> assuming that chdir'ing into the of the submodule and running >> it (i.e. cp.dir set to to drive start_command(&cp)) is >> sufficient. When /.git (either it is a directory itself or it >> points at a directory in .git/module/ in the superproject) is >> a corrupt repository, running "git -C command" would try to >> auto-detect the repository, because it thinks /.git is not a >> repository and it thinks it is not at the top-level of the working >> tree, and instead finds the repository of the top-level, which is >> almost never what we want. > > This is with a test that covers the call in get_next_submodule() for > the parallel fetch callback. I think many of the codepaths will end > up recursing forever the same way without the fix in a submodule > repository that is broken in a similar way, but I didn't check, so > I do not consider this to be completed. Oh I see. That seems like a nasty bug. > > -- >8 -- > Subject: submodule: avoid auto-discovery in prepare_submodule_repo_env() > > The function is used to set up the environment variable used in a > subprocess we spawn in a submodule directory. The callers set up a > child_process structure, find the working tree path of one submodule > and set .dir field to it, and then use start_command() API to spawn > the subprocess like "status", "fetch", etc. > > When this happens, we expect that the ".git" (either a directory or > a gitfile that points at the real location) in the current working > directory of the subprocess MUST be the repository for the submodule. > > If this ".git" thing is a corrupt repository, however, because > prepare_submodule_repo_env() unsets GIT_DIR and GIT_WORK_TREE, the > subprocess will see ".git", thinks it is not a repository, and > attempt to find one by going up, likely to end up in finding the > repository of the superproject. In some codepaths, this will cause > a command run with the "--recurse-submodules" option to recurse > forever. > > By exporting GIT_DIR=.git, disable the auto-discovery logic in the > subprocess, which would instead stop it and report an error. and GIT_DIR=.git works for both .git files as well as the old fashioned way, with the submodule repository at .git/, although that is not really documented. > > Not-signed-off-yet. > --- > submodule.c | 1 + > t/t5526-fetch-submodules.sh | 29 + > 2 files changed, 30 insertions(+) > > diff --git a/submodule.c b/submodule.c > index 1b5cdfb..e8258f0 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1160,4 +1160,5 @@ void prepare_submodule_repo_env(struct argv_array *out) > if (strcmp(*var, CONFIG_DATA_ENVIRONMENT)) > argv_array_push(out, *var); > } > + argv_array_push(out, "GIT_DIR=.git"); > } > diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh > index 954d0e4..b2dee30 100755 > --- a/t/t5526-fetch-submodules.sh > +++ b/t/t5526-fetch-submodules.sh > @@ -485,4 +485,33 @@ test_expect_success 'fetching submodules respects > parallel settings' ' > ) > ' > > +test_expect_success 'fetching submodule into a broken repository' ' > + # Prepare src and src/sub nested in it > + git init src && > + ( > + cd src && > + git init sub && > + git -C sub commit --allow-empty -m "initial in sub" && > + git submodule add -- ./sub sub && > + git commit -m "initial in top" > + ) && This is not needed, as setup() set up some repositories for you. > + > + # Clone the old-fashoned way > + git clone src dst && if you don't go with your own setup, maybe: git clone . dst git -C dst clone ../submodule submodule # and further down s/sub/submodule/ > + git -C dst clone ../src/sub sub && > + > + # Make sure that old-fashoned layout is still supported fashioned > + git -C dst status && > + > + # Recursive-fetch works fine > + git -C dst fetch --recurse-submodules && > + > + # Break the receiving submodule > + rm -f dst/sub/.git/HEAD && > + > + # Recursive-fetc
Re: git submodules implementation question
Uma Srinivasan writes: > Yes, this one line fix addresses my problem. > > So, what is the next step? Will someone submit a patch or should I? > Please note that I've never submitted a patch before, but I don't mind > learning how to. The final version needs to be accompanied with tests to show the effect of this change for callers. A test would set up a top-level and submodule, deliberately break submodule/.git/ repository and show what breaks and how without this change. For example, there is a call in ok_to_remove_submodule() to run "git status" in the submodule directory. A test writer needs to find who calls ok_to_remove_submodule() in what codepath (e.g. builtin/rm.c). Then after figuring out under what condition the check is triggered, write a test that runs "git rm" in a way that this change makes a difference. Hopefully, the test will fail without this change, and will pass with this change. So it is a bit more involved than just a single liner patch with one paragraph log message. I may be able to find or make some time after I tag the 2.10 final this weekend to do so myself. >> --- a/submodule.c >> +++ b/submodule.c >> @@ -1160,4 +1160,5 @@ void prepare_submodule_repo_env(struct argv_array *out) >> if (strcmp(*var, CONFIG_DATA_ENVIRONMENT)) >> argv_array_push(out, *var); >> } >> + argv_array_push(out, "GIT_DIR=.git"); >> } >>
Re: git submodules implementation question
Stefan Beller writes: >> +test_expect_success 'fetching submodule into a broken repository' ' >> + # Prepare src and src/sub nested in it >> + git init src && >> + ( >> + cd src && >> + git init sub && >> + git -C sub commit --allow-empty -m "initial in sub" && >> + git submodule add -- ./sub sub && >> + git commit -m "initial in top" >> + ) && > > This is not needed, as setup() set up some repositories for you. I didn't want any random cruft left behind in the top-level by previous tests, so this is very much deliberate.
Re: git submodules implementation question
Junio C Hamano writes: If we add # "git diff" should terminate with an error. # NOTE: without fix this will recurse forever! test_must_fail git -C dst diff && after breaking the repository, we can also see "git diff" recurse forever, because it wants to know if "sub" submodule is modified, attempts to run "git status" in there, and ends up running that command in the context of the parent repository. I am tempted to cheat and commit this, even though this test is no longer about fetching submodules. -- >8 -- [PATCH] submodule: avoid auto-discovery in prepare_submodule_repo_env() The function is used to set up the environment variable used in a subprocess we spawn in a submodule directory. The callers set up a child_process structure, find the working tree path of one submodule and set .dir field to it, and then use start_command() API to spawn the subprocess like "status", "fetch", etc. When this happens, we expect that the ".git" (either a directory or a gitfile that points at the real location) in the current working directory of the subprocess MUST be the repository for the submodule. If this ".git" thing is a corrupt repository, however, because prepare_submodule_repo_env() unsets GIT_DIR and GIT_WORK_TREE, the subprocess will see ".git", thinks it is not a repository, and attempt to find one by going up, likely to end up in finding the repository of the superproject. In some codepaths, this will cause a command run with the "--recurse-submodules" option to recurse forever. By exporting GIT_DIR=.git, disable the auto-discovery logic in the subprocess, which would instead stop it and report an error. The test illustrates existing problems in a few callsites of this function. Without this fix, "git fetch --recurse-submodules", "git status" and "git diff" keep recursing forever. Signed-off-by: Junio C Hamano --- submodule.c | 1 + t/t5526-fetch-submodules.sh | 35 +++ 2 files changed, 36 insertions(+) diff --git a/submodule.c b/submodule.c index 4532b11..2801fbb 100644 --- a/submodule.c +++ b/submodule.c @@ -1160,4 +1160,5 @@ void prepare_submodule_repo_env(struct argv_array *out) if (strcmp(*var, CONFIG_DATA_ENVIRONMENT)) argv_array_push(out, *var); } + argv_array_push(out, "GIT_DIR=.git"); } diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh index 954d0e4..f3b0a8d 100755 --- a/t/t5526-fetch-submodules.sh +++ b/t/t5526-fetch-submodules.sh @@ -485,4 +485,39 @@ test_expect_success 'fetching submodules respects parallel settings' ' ) ' +test_expect_success 'fetching submodule into a broken repository' ' + # Prepare src and src/sub nested in it + git init src && + ( + cd src && + git init sub && + git -C sub commit --allow-empty -m "initial in sub" && + git submodule add -- ./sub sub && + git commit -m "initial in top" + ) && + + # Clone the old-fashoned way + git clone src dst && + git -C dst clone ../src/sub sub && + + # Make sure that old-fashoned layout is still supported + git -C dst status && + + # "diff" would find no change + git -C dst diff --exit-code && + + # Recursive-fetch works fine + git -C dst fetch --recurse-submodules && + + # Break the receiving submodule + rm -f dst/sub/.git/HEAD && + + # NOTE: without the fix the following tests will recurse forever! + # They should terminate with an error. + + test_must_fail git -C dst status && + test_must_fail git -C dst diff && + test_must_fail git -C dst fetch --recurse-submodules +' + test_done -- 2.10.0-rc2-314-g775ea9a
Re: git submodules implementation question
Uma Srinivasan writes: > Anyway, with the fix "git status" fails quickly both in my reproducer > (and original repo) with the following message. > fatal: Not a git repository: '.git' > fatal: 'git status --porcelain' failed in submodule commonlib Thanks, that is exactly what I wanted to see as an outcome when I did the one-liner patch.
Re: git submodules implementation question
Junio C Hamano writes: > Stefan Beller writes: > >>> The final version needs to be accompanied with tests to show the >>> effect of this change for callers. A test would set up a top-level >>> and submodule, deliberately break submodule/.git/ repository and >>> show what breaks and how without this change. >> >> Tests are really good at providing this context as well, or to communicate >> the actual underlying problem, which is not quite clear to me. >> That is why I refrained from jumping into the discussion as I think the >> first few emails were dropped from the mailing list and I am missing context. > > I do not know where you started reading, but the gist of it is that > submodule.c spawns subprocess to run in the submodule's context by > assuming that chdir'ing into the of the submodule and running > it (i.e. cp.dir set to to drive start_command(&cp)) is > sufficient. When /.git (either it is a directory itself or it > points at a directory in .git/module/ in the superproject) is > a corrupt repository, running "git -C command" would try to > auto-detect the repository, because it thinks /.git is not a > repository and it thinks it is not at the top-level of the working > tree, and instead finds the repository of the top-level, which is > almost never what we want. This is with a test that covers the call in get_next_submodule() for the parallel fetch callback. I think many of the codepaths will end up recursing forever the same way without the fix in a submodule repository that is broken in a similar way, but I didn't check, so I do not consider this to be completed. -- >8 -- Subject: submodule: avoid auto-discovery in prepare_submodule_repo_env() The function is used to set up the environment variable used in a subprocess we spawn in a submodule directory. The callers set up a child_process structure, find the working tree path of one submodule and set .dir field to it, and then use start_command() API to spawn the subprocess like "status", "fetch", etc. When this happens, we expect that the ".git" (either a directory or a gitfile that points at the real location) in the current working directory of the subprocess MUST be the repository for the submodule. If this ".git" thing is a corrupt repository, however, because prepare_submodule_repo_env() unsets GIT_DIR and GIT_WORK_TREE, the subprocess will see ".git", thinks it is not a repository, and attempt to find one by going up, likely to end up in finding the repository of the superproject. In some codepaths, this will cause a command run with the "--recurse-submodules" option to recurse forever. By exporting GIT_DIR=.git, disable the auto-discovery logic in the subprocess, which would instead stop it and report an error. Not-signed-off-yet. --- submodule.c | 1 + t/t5526-fetch-submodules.sh | 29 + 2 files changed, 30 insertions(+) diff --git a/submodule.c b/submodule.c index 1b5cdfb..e8258f0 100644 --- a/submodule.c +++ b/submodule.c @@ -1160,4 +1160,5 @@ void prepare_submodule_repo_env(struct argv_array *out) if (strcmp(*var, CONFIG_DATA_ENVIRONMENT)) argv_array_push(out, *var); } + argv_array_push(out, "GIT_DIR=.git"); } diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh index 954d0e4..b2dee30 100755 --- a/t/t5526-fetch-submodules.sh +++ b/t/t5526-fetch-submodules.sh @@ -485,4 +485,33 @@ test_expect_success 'fetching submodules respects parallel settings' ' ) ' +test_expect_success 'fetching submodule into a broken repository' ' + # Prepare src and src/sub nested in it + git init src && + ( + cd src && + git init sub && + git -C sub commit --allow-empty -m "initial in sub" && + git submodule add -- ./sub sub && + git commit -m "initial in top" + ) && + + # Clone the old-fashoned way + git clone src dst && + git -C dst clone ../src/sub sub && + + # Make sure that old-fashoned layout is still supported + git -C dst status && + + # Recursive-fetch works fine + git -C dst fetch --recurse-submodules && + + # Break the receiving submodule + rm -f dst/sub/.git/HEAD && + + # Recursive-fetch must terminate + # NOTE: without fix this will recurse forever! + test_must_fail git -C dst fetch --recurse-submodules +' + test_done
Re: git submodules implementation question
Stefan Beller writes: >> The final version needs to be accompanied with tests to show the >> effect of this change for callers. A test would set up a top-level >> and submodule, deliberately break submodule/.git/ repository and >> show what breaks and how without this change. > > Tests are really good at providing this context as well, or to communicate > the actual underlying problem, which is not quite clear to me. > That is why I refrained from jumping into the discussion as I think the > first few emails were dropped from the mailing list and I am missing context. I do not know where you started reading, but the gist of it is that submodule.c spawns subprocess to run in the submodule's context by assuming that chdir'ing into the of the submodule and running it (i.e. cp.dir set to to drive start_command(&cp)) is sufficient. When /.git (either it is a directory itself or it points at a directory in .git/module/ in the superproject) is a corrupt repository, running "git -C command" would try to auto-detect the repository, because it thinks /.git is not a repository and it thinks it is not at the top-level of the working tree, and instead finds the repository of the top-level, which is almost never what we want.
Re: git submodules implementation question
Yes, this one line fix addresses my problem. So, what is the next step? Will someone submit a patch or should I? Please note that I've never submitted a patch before, but I don't mind learning how to. Thanks, Uma > --- a/submodule.c > +++ b/submodule.c > @@ -1160,4 +1160,5 @@ void prepare_submodule_repo_env(struct argv_array *out) > if (strcmp(*var, CONFIG_DATA_ENVIRONMENT)) > argv_array_push(out, *var); > } > + argv_array_push(out, "GIT_DIR=.git"); > } >
Re: git submodules implementation question
Junio C Hamano writes: > I was wondering if we should unconditionally stuff GIT_DIR= repository location for the submodule> in the cp.env_array passed to > the function prepare_submodule_repo_env(). As cp.dir will be set to > the submodule's working tree, there is no need to set GIT_WORK_TREE > and export it, I think, although it would not hurt. After checking all callers of prepare_submodule_repo_env() and saw that cp.dir is set to the top of working directory for the submodule, I wonder if something as simple and stupid like the attached patch is sufficient. Our subprocess will go there, and there should be a .git embedded directory or a .git file that points into .git/modules/* in the superproject, and either way, it should use .git directly underneath it. submodule.c | 1 + 1 file changed, 1 insertion(+) diff --git a/submodule.c b/submodule.c index 1b5cdfb..e8258f0 100644 --- a/submodule.c +++ b/submodule.c @@ -1160,4 +1160,5 @@ void prepare_submodule_repo_env(struct argv_array *out) if (strcmp(*var, CONFIG_DATA_ENVIRONMENT)) argv_array_push(out, *var); } + argv_array_push(out, "GIT_DIR=.git"); }
Re: git submodules implementation question
Here's one more attempt where I changed prepare_submodule_repo_env() to take the submodule git dir as an argument. Sorry for including this long code diff inline. If there's a better way to have this discussion with code please let me know. Thanks, Uma diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index 9d79f19..0741f6c 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -465,7 +465,7 @@ static int clone_submodule(const char *path, const char *gitdir, const char *url argv_array_push(&cp.args, path); cp.git_cmd = 1; - prepare_submodule_repo_env(&cp.env_array); + prepare_submodule_repo_env(&cp.env_array, gitdir); cp.no_stdin = 1; return run_command(&cp); diff --git a/submodule.c b/submodule.c index 5a62aa2..3d9174a 100644 --- a/submodule.c +++ b/submodule.c @@ -522,11 +522,31 @@ static int has_remote(const char *refname, const struct object_id *oid, return 1; } +static const char *submodule_git_dir = NULL; +const char *get_submodule_git_dir(const char *path, struct strbuf *bufptr) +{ + strbuf_addf(bufptr, "%s/.git", path); + submodule_git_dir = read_gitfile(bufptr->buf); + if (!submodule_git_dir) + submodule_git_dir = bufptr->buf; + if (!is_directory(submodule_git_dir)) { + return NULL; + } + return submodule_git_dir; +} + static int submodule_needs_pushing(const char *path, const unsigned char sha1[20]) { if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) return 0; + struct strbuf gitdirbuf = STRBUF_INIT; +const char *sm_gitdir = get_submodule_git_dir(path, &gitdirbuf); + if (sm_gitdir == NULL) { + strbuf_release(&gitdirbuf); + return 0; + } if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { struct child_process cp = CHILD_PROCESS_INIT; const char *argv[] = {"rev-list", NULL, "--not", "--remotes", "-n", "1" , NULL}; @@ -535,7 +555,7 @@ static int submodule_needs_pushing(const char *path, const unsigned char sha1[20 argv[1] = sha1_to_hex(sha1); cp.argv = argv; - prepare_submodule_repo_env(&cp.env_array); + prepare_submodule_repo_env(&cp.env_array, sm_gitdir); cp.git_cmd = 1; cp.no_stdin = 1; cp.out = -1; @@ -551,6 +571,7 @@ static int submodule_needs_pushing(const char *path, const unsigned char sha1[20 return needs_pushing; } + strbuf_release(&gitdirbuf); return 0; } @@ -617,12 +638,18 @@ static int push_submodule(const char *path) if (add_submodule_odb(path)) return 1; + struct strbuf gitdirbuf = STRBUF_INIT; +const char *sm_gitdir = get_submodule_git_dir(path, &gitdirbuf); + if (sm_gitdir == NULL) { + strbuf_release(&gitdirbuf); + return 0; + } if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { struct child_process cp = CHILD_PROCESS_INIT; const char *argv[] = {"push", NULL}; cp.argv = argv; - prepare_submodule_repo_env(&cp.env_array); + prepare_submodule_repo_env(&cp.env_array, sm_gitdir); cp.git_cmd = 1; cp.no_stdin = 1; cp.dir = path; @@ -631,6 +658,7 @@ static int push_submodule(const char *path) close(cp.out); } + strbuf_release(&gitdirbuf); return 1; } @@ -665,10 +693,16 @@ static int is_submodule_commit_present(const char *path, unsigned char sha1[20]) struct child_process cp = CHILD_PROCESS_INIT; const char *argv[] = {"rev-list", "-n", "1", NULL, "--not", "--all", NULL}; struct strbuf buf = STRBUF_INIT; + struct strbuf gitdirbuf = STRBUF_INIT; + const char *sm_gitdir = get_submodule_git_dir(path, &gitdirbuf); + if (sm_gitdir == NULL) { + strbuf_release(&gitdirbuf); + return 0; + } argv[3] = sha1_to_hex(sha1); cp.argv = argv; - prepare_submodule_repo_env(&cp.env_array); + prepare_submodule_repo_env(&cp.env_array, sm_gitdir); cp.git_cmd = 1; cp.no_stdin = 1; cp.dir = path; @@ -676,6 +710,7 @@ static int is_submodule_commit_present(const char *path, unsigned char sha1[20]) is_present = 1; strbuf_release(&buf); + strbuf_release(&gitdirbuf); } return is_present; } @@ -851,7 +886,7 @@ static int get_next_submodule(struct child_process *cp, if (is_directory(git_dir)) { child_process_init(cp); cp-
Re: git submodules implementation question
> We want to affect only the process we are going to spawn to work > inside the submodule, not ourselves, which is what this call does; > this does not sound like a good idea. Okay, in that case I would have to pass the "git_dir" as a new argument to prepare_submodule_repo_env(). I know what to pass from the is_submodule_modified() caller. I don't think it's all that obvious for the other callers. Thanks, Uma On Wed, Aug 31, 2016 at 11:44 AM, Junio C Hamano wrote: > Uma Srinivasan writes: > >> diff --git a/submodule.c b/submodule.c >> index 5a62aa2..23443a7 100644 >> --- a/submodule.c >> +++ b/submodule.c >> @@ -960,6 +960,9 @@ unsigned is_submodule_modified(const char *path, >> int ignore_untracked) >> return 0; >> >> } >> + /* stuff submodule git dir into env var */ >> + set_git_dir(git_dir); > > We want to affect only the process we are going to spawn to work > inside the submodule, not ourselves, which is what this call does; > this does not sound like a good idea. >
Re: git submodules implementation question
Uma Srinivasan writes: > diff --git a/submodule.c b/submodule.c > index 5a62aa2..23443a7 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -960,6 +960,9 @@ unsigned is_submodule_modified(const char *path, > int ignore_untracked) > return 0; > > } > + /* stuff submodule git dir into env var */ > + set_git_dir(git_dir); We want to affect only the process we are going to spawn to work inside the submodule, not ourselves, which is what this call does; this does not sound like a good idea.
Re: git submodules implementation question
FWIW, I tried the following quick change based on the info. provided by Junio and it seems to "fix" my original problem. I'll let you experts figure out if we need a more complete fix or not. diff --git a/submodule.c b/submodule.c index 5a62aa2..23443a7 100644 --- a/submodule.c +++ b/submodule.c @@ -960,6 +960,9 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) return 0; } + /* stuff submodule git dir into env var */ + set_git_dir(git_dir); + strbuf_reset(&buf); if (ignore_untracked) @@ -1279,5 +1282,9 @@ void prepare_submodule_repo_env(struct argv_array *out) for (var = local_repo_env; *var; var++) { if (strcmp(*var, CONFIG_DATA_ENVIRONMENT)) argv_array_push(out, *var); + if (strcmp(*var, GIT_DIR_ENVIRONMENT)) + argv_array_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT, +get_git_dir()); } + } Thanks, Uma On Wed, Aug 31, 2016 at 9:42 AM, Junio C Hamano wrote: > Uma Srinivasan writes: > >>> I might suggest to update prepare_submodule_repo_env() so that the >>> spawned process will *NOT* have to guess where the working tree and >>> repository by exporting GIT_DIR (set to "git_dir" we discover above) >>> and GIT_WORK_TREE (set to "." as cp.dir is set to the path to the >>> working tree of the submodule). That would stop the "git status" to >>> guess (and fooled by a corrupted dir/.git that is not a git >>> repository). >> >> Here's where I am struggling with my lack of knowledge of git >> internals and the implementation particularly in the context of how >> environment variables are passed from the parent to the child process. > > Ah, I was primarily addressing Jacob in the latter part of my > message, as he's looked at similar things in his recent topic. > >> Are you suggesting that we set up the child process environment array >> (the "out" argument) in prepare_submodule_repo_env() to include >> GIT_DIR_ENVIRONMENT and GIT_WORK_TREE_ENVIRONMENT in addition to >> CONFIG_DATA_ENVIRONMENT that is there now? > > I was wondering if we should unconditionally stuff GIT_DIR= repository location for the submodule> in the cp.env_array passed to > the function prepare_submodule_repo_env(). As cp.dir will be set to > the submodule's working tree, there is no need to set GIT_WORK_TREE > and export it, I think, although it would not hurt. > > After all, we _are_ going into a separate and different project's > repository (that is what a submodule is), and we _know_ where its > repository data (i.e. GIT_DIR) and the location of its working tree > (i.e. GIT_WORK_TREE). There is no reason for the process that will > work in the submodule to go through the usual "do we have .git in > our $cwd that is a repository? if not how about the parent directory > of $cwd? go up until we find one and that directory is the top of > the working tree" discovery. > > More importantly, this matters when your GIT_DIR for the submodule > is somehow corrupt. The discovery process would say "there is .git > in $cwd but that is not a repository" and continue upwards, which > likely would find the repository for the top-level superproject, > which definitely is _not_ want you want to happen. And one way to > avoid it is to tell the setup.c code that it should not do the > discovery by being explicit. >
Re: git submodules implementation question
Uma Srinivasan writes: >> I might suggest to update prepare_submodule_repo_env() so that the >> spawned process will *NOT* have to guess where the working tree and >> repository by exporting GIT_DIR (set to "git_dir" we discover above) >> and GIT_WORK_TREE (set to "." as cp.dir is set to the path to the >> working tree of the submodule). That would stop the "git status" to >> guess (and fooled by a corrupted dir/.git that is not a git >> repository). > > Here's where I am struggling with my lack of knowledge of git > internals and the implementation particularly in the context of how > environment variables are passed from the parent to the child process. Ah, I was primarily addressing Jacob in the latter part of my message, as he's looked at similar things in his recent topic. > Are you suggesting that we set up the child process environment array > (the "out" argument) in prepare_submodule_repo_env() to include > GIT_DIR_ENVIRONMENT and GIT_WORK_TREE_ENVIRONMENT in addition to > CONFIG_DATA_ENVIRONMENT that is there now? I was wondering if we should unconditionally stuff GIT_DIR= in the cp.env_array passed to the function prepare_submodule_repo_env(). As cp.dir will be set to the submodule's working tree, there is no need to set GIT_WORK_TREE and export it, I think, although it would not hurt. After all, we _are_ going into a separate and different project's repository (that is what a submodule is), and we _know_ where its repository data (i.e. GIT_DIR) and the location of its working tree (i.e. GIT_WORK_TREE). There is no reason for the process that will work in the submodule to go through the usual "do we have .git in our $cwd that is a repository? if not how about the parent directory of $cwd? go up until we find one and that directory is the top of the working tree" discovery. More importantly, this matters when your GIT_DIR for the submodule is somehow corrupt. The discovery process would say "there is .git in $cwd but that is not a repository" and continue upwards, which likely would find the repository for the top-level superproject, which definitely is _not_ want you want to happen. And one way to avoid it is to tell the setup.c code that it should not do the discovery by being explicit.
Re: git submodules implementation question
On Tue, Aug 30, 2016 at 10:53 AM, Junio C Hamano wrote: > Uma Srinivasan writes: > >> I think the following fix is still needed to is_submodule_modified(): >> >> strbuf_addf(&buf, "%s/.git", path); >> git_dir = read_gitfile(buf.buf); >> if (!git_dir) { >> git_dir = buf.buf; >> ==> if (!is_git_directory(git_dir)) { >> ==> die("Corrupted .git dir in submodule %s", path); >> ==> } >> } > > If it is so important that git_dir is a valid Git > repository after > > git_dir = read_gitfile(buf.buf); > if (!git_dir) > git_dir = buf.buf; > > is done to determine what "git_dir" to use, it seems to me that it > does not make much sense to check ONLY dir/.git that is a directory > and leave .git/modules/$name that dir/.git file points at unchecked. > Okay. > But there is much bigger problem with the above addition, I think. > There also can be a case where dir/ does not even have ".git" in it. > A submodule the user is not interested in will just have an empty > directory there, and immediately after the above three lines I > reproduced above, we have this > > if (!is_directory(git_dir)) { > strbuf_release(&buf); > return 0; > } > Good to know about this use case. > The added check will break the use case. If anything, that check, > if this code needs to verify that "git_dir" points at a valid Git > repository, should come _after_ that. Okay. > > Shouldn't "git-status --porcelain" run in the submodule notice that > it is not a valid repository and quickly die anyway? Why should we > even check before spawning that process in the first place? I don't see any reason to disagree with this. > > I might suggest to update prepare_submodule_repo_env() so that the > spawned process will *NOT* have to guess where the working tree and > repository by exporting GIT_DIR (set to "git_dir" we discover above) > and GIT_WORK_TREE (set to "." as cp.dir is set to the path to the > working tree of the submodule). That would stop the "git status" to > guess (and fooled by a corrupted dir/.git that is not a git > repository). > Here's where I am struggling with my lack of knowledge of git internals and the implementation particularly in the context of how environment variables are passed from the parent to the child process. Are you suggesting that we set up the child process environment array (the "out" argument) in prepare_submodule_repo_env() to include GIT_DIR_ENVIRONMENT and GIT_WORK_TREE_ENVIRONMENT in addition to CONFIG_DATA_ENVIRONMENT that is there now? Can I use the strcmp and argv_array_push() calls to do this? There are several callers for this routine prepare_submodule_repo_env(). Would the caller pass the git dir and work tree as arguments to this routine or would the caller set up the environment variables as needed? Is there any documentation or other example code where similar passing of env. vars to a child process is being done? Sorry for all these questions but I'm still a novice as far as the code is concerned. Thanks for your help. Uma
Re: git submodules implementation question
Uma Srinivasan writes: > I think the following fix is still needed to is_submodule_modified(): > > strbuf_addf(&buf, "%s/.git", path); > git_dir = read_gitfile(buf.buf); > if (!git_dir) { > git_dir = buf.buf; > ==> if (!is_git_directory(git_dir)) { > ==> die("Corrupted .git dir in submodule %s", path); > ==> } > } If it is so important that git_dir is a valid Git repository after git_dir = read_gitfile(buf.buf); if (!git_dir) git_dir = buf.buf; is done to determine what "git_dir" to use, it seems to me that it does not make much sense to check ONLY dir/.git that is a directory and leave .git/modules/$name that dir/.git file points at unchecked. But there is much bigger problem with the above addition, I think. There also can be a case where dir/ does not even have ".git" in it. A submodule the user is not interested in will just have an empty directory there, and immediately after the above three lines I reproduced above, we have this if (!is_directory(git_dir)) { strbuf_release(&buf); return 0; } The added check will break the use case. If anything, that check, if this code needs to verify that "git_dir" points at a valid Git repository, should come _after_ that. Shouldn't "git-status --porcelain" run in the submodule notice that it is not a valid repository and quickly die anyway? Why should we even check before spawning that process in the first place? I might suggest to update prepare_submodule_repo_env() so that the spawned process will *NOT* have to guess where the working tree and repository by exporting GIT_DIR (set to "git_dir" we discover above) and GIT_WORK_TREE (set to "." as cp.dir is set to the path to the working tree of the submodule). That would stop the "git status" to guess (and fooled by a corrupted dir/.git that is not a git repository).
Re: git submodules implementation question
Thanks for the patch. Unfortunately, it doesn't help in my case as it invokes the is_submodule_modified() routine which you didn't modify. Here's my call trace #0 is_submodule_modified (path=path@entry=0x17c2f08 "groc", ignore_untracked=0) at submodule.c:939 #1 0x004aa4dc in match_stat_with_submodule ( diffopt=diffopt@entry=0x7fffde18, ce=ce@entry=0x17c2eb0, st=st@entry=0x7fffd840, ce_option=ce_option@entry=0, dirty_submodule=dirty_submodule@entry=0x7fffd83c) at diff-lib.c:81 #2 0x004ab4f5 in run_diff_files (revs=revs@entry=0x7fffd920, option=option@entry=0) at diff-lib.c:217 #3 0x0054c0d4 in wt_status_collect_changes_worktree (s=s@entry=0x7de280 ) at wt-status.c:559 #4 0x0054ecf6 in wt_status_collect (s=s@entry=0x7de280 ) at wt-status.c:678 #5 0x00422171 in cmd_status (argc=, argv=, prefix=0x0) at builtin/commit.c:1390 #6 0x00405abe in run_builtin (argv=, argc=, p=) at git.c:352 #7 handle_builtin (argc=1, argv=0x7fffe570) at git.c:551 #8 0x00405dd8 in run_argv (argv=0x7fffe320, argcp=0x7fffe32c) at git.c:606 #9 cmd_main (argc=1, argc@entry=2, argv=0x7fffe570, argv@entry=0x7fffe568) at git.c:678 #10 0x00405060 in main (argc=2, argv=0x7fffe568) at common-main.c:40 I think the following fix is still needed to is_submodule_modified(): strbuf_addf(&buf, "%s/.git", path); git_dir = read_gitfile(buf.buf); if (!git_dir) { git_dir = buf.buf; ==> if (!is_git_directory(git_dir)) { ==> die("Corrupted .git dir in submodule %s", path); ==> } } Thanks, Uma On Mon, Aug 29, 2016 at 11:23 PM, Jacob Keller wrote: > On Mon, Aug 29, 2016 at 11:09 PM, Jacob Keller wrote: >> On Mon, Aug 29, 2016 at 5:12 PM, Uma Srinivasan >> wrote: >>> This is great! Thanks Jake. If you happen to have the patch ID it >>> would be helpful. >>> >>> Uma >>> >> >> http://public-inbox.org/git/1472236108.28343.5.ca...@intel.com/ > > > Actually correct patch is > http://public-inbox.org/git/20160825233243.30700-6-jacob.e.kel...@intel.com/ > > Thanks, > Jake
Re: git submodules implementation question
On Mon, Aug 29, 2016 at 11:09 PM, Jacob Keller wrote: > On Mon, Aug 29, 2016 at 5:12 PM, Uma Srinivasan > wrote: >> This is great! Thanks Jake. If you happen to have the patch ID it >> would be helpful. >> >> Uma >> > > http://public-inbox.org/git/1472236108.28343.5.ca...@intel.com/ Actually correct patch is http://public-inbox.org/git/20160825233243.30700-6-jacob.e.kel...@intel.com/ Thanks, Jake
Re: git submodules implementation question
On Mon, Aug 29, 2016 at 5:12 PM, Uma Srinivasan wrote: > This is great! Thanks Jake. If you happen to have the patch ID it > would be helpful. > > Uma > http://public-inbox.org/git/1472236108.28343.5.ca...@intel.com/
Re: git submodules implementation question
This is great! Thanks Jake. If you happen to have the patch ID it would be helpful. Uma On Mon, Aug 29, 2016 at 5:02 PM, Jacob Keller wrote: > On Mon, Aug 29, 2016 at 4:15 PM, Junio C Hamano wrote: >> Uma Srinivasan writes: >>> This fixes my issue but what do you think? Is this the right way to >>> fix it? Is there a better way? >> >> I think we already have a helper function that does a lot better >> than "does it have a file called HEAD" to ask "is this a git >> directory?" and its name I think is "is_git_directory" (I know, we >> are not imaginative when naming our functions). >> >> As to the check makes sense in the context of this function, I am >> not an expert to judge. I'd expect Jens, Heiko and/or Stefan to >> know better than I do. > > One of my patches adds a "is_git_directory()" call to this, and if we > fail falls back to checking the .gitmodules and git-config for > information regarding the submodule should it no longer be checked > out. I suspect this patch will address your concern. > > Thanks, > Jake
Re: git submodules implementation question
On Mon, Aug 29, 2016 at 4:15 PM, Junio C Hamano wrote: > Uma Srinivasan writes: >> This fixes my issue but what do you think? Is this the right way to >> fix it? Is there a better way? > > I think we already have a helper function that does a lot better > than "does it have a file called HEAD" to ask "is this a git > directory?" and its name I think is "is_git_directory" (I know, we > are not imaginative when naming our functions). > > As to the check makes sense in the context of this function, I am > not an expert to judge. I'd expect Jens, Heiko and/or Stefan to > know better than I do. One of my patches adds a "is_git_directory()" call to this, and if we fail falls back to checking the .gitmodules and git-config for information regarding the submodule should it no longer be checked out. I suspect this patch will address your concern. Thanks, Jake
Re: git submodules implementation question
Yes, is_git_directory() is much better. Thanks for the pointer. I will submit a patch unless I hear more suggestions from others. Uma On Mon, Aug 29, 2016 at 4:15 PM, Junio C Hamano wrote: > Uma Srinivasan writes: > >> On Mon, Aug 29, 2016 at 2:13 PM, Uma Srinivasan >> wrote: >>> Ok that makes sense. Thanks much. >>> >>> Uma >>> >> With respect to my original problem with a corrupted .git directory >> under the submodule directory, I am thinking of adding the following 4 >> lines marked with ### to is_submodule_modified() to detect the >> corrupted dir and die quickly instead of forking several child >> processes: >> >>strbuf_addf(&buf, "%s/.git", path); >>git_dir = read_gitfile(buf.buf); >>if (!git_dir) { >> ### strbuf_addf(&head_ref, "%s/HEAD",buf.buf); >> ### if (strbuf_read_file(&temp_ref, head_ref.buf,0) < 0) { >> ### die("Corrupted .git dir in submodule %s", >> path); >> ###} >> git_dir = buf.buf; >> } >> >> This fixes my issue but what do you think? Is this the right way to >> fix it? Is there a better way? > > I think we already have a helper function that does a lot better > than "does it have a file called HEAD" to ask "is this a git > directory?" and its name I think is "is_git_directory" (I know, we > are not imaginative when naming our functions). > > As to the check makes sense in the context of this function, I am > not an expert to judge. I'd expect Jens, Heiko and/or Stefan to > know better than I do.
Re: git submodules implementation question
Uma Srinivasan writes: > On Mon, Aug 29, 2016 at 2:13 PM, Uma Srinivasan > wrote: >> Ok that makes sense. Thanks much. >> >> Uma >> > With respect to my original problem with a corrupted .git directory > under the submodule directory, I am thinking of adding the following 4 > lines marked with ### to is_submodule_modified() to detect the > corrupted dir and die quickly instead of forking several child > processes: > >strbuf_addf(&buf, "%s/.git", path); >git_dir = read_gitfile(buf.buf); >if (!git_dir) { > ### strbuf_addf(&head_ref, "%s/HEAD",buf.buf); > ### if (strbuf_read_file(&temp_ref, head_ref.buf,0) < 0) { > ### die("Corrupted .git dir in submodule %s", path); > ###} > git_dir = buf.buf; > } > > This fixes my issue but what do you think? Is this the right way to > fix it? Is there a better way? I think we already have a helper function that does a lot better than "does it have a file called HEAD" to ask "is this a git directory?" and its name I think is "is_git_directory" (I know, we are not imaginative when naming our functions). As to the check makes sense in the context of this function, I am not an expert to judge. I'd expect Jens, Heiko and/or Stefan to know better than I do.
Re: git submodules implementation question
With respect to my original problem with a corrupted .git directory under the submodule directory, I am thinking of adding the following 4 lines marked with ### to is_submodule_modified() to detect the corrupted dir and die quickly instead of forking several child processes: strbuf_addf(&buf, "%s/.git", path); git_dir = read_gitfile(buf.buf); if (!git_dir) { ### strbuf_addf(&head_ref, "%s/HEAD",buf.buf); ### if (strbuf_read_file(&temp_ref, head_ref.buf,0) < 0) { ### die("Corrupted .git dir in submodule %s", path); ###} git_dir = buf.buf; } This fixes my issue but what do you think? Is this the right way to fix it? Is there a better way? Thanks, Uma On Mon, Aug 29, 2016 at 2:13 PM, Uma Srinivasan wrote: > Ok that makes sense. Thanks much. > > Uma > > On Mon, Aug 29, 2016 at 2:09 PM, Junio C Hamano wrote: >> On Mon, Aug 29, 2016 at 2:03 PM, Uma Srinivasan >> wrote: >>> On Mon, Aug 29, 2016 at 1:03 PM, Junio C Hamano wrote: A top-level superproject can have a submodule bound at its "dir/" directory, and "dir/.git" can either be a gitfile which you can read with read_gitfile() and point into somewhere in ".git/modules/" of the top-level superproject. "dir/.git" can _ALSO_ be a fully valid Git directory. So at the top of a superproject, you could do git clone $URL ./dir2 git add dir2 to clone an independent project into dir2 directory, and add it as a new submodule. The fallback is to support such a layout. >>> Thanks for the reply. However, in this case >>> >>> git clone $URL ./dir2 >>> git add dir2 >>> >>> how will "dir2" get ever get registered as a submodule? >> >> With a separate invocation of "git config -f .gitmodules", of course. >> The layout to use gitfile to point into .git/modules/ is a more recent >> invention than the submodule support itself that "git add" knows about. >> The code needs to support both layout as well as it can, and that >> is what the "can we read it as gitfile? If not, that directory itself may >> be a git repository" codepath you asked about is doing.
Re: git submodules implementation question
Ok that makes sense. Thanks much. Uma On Mon, Aug 29, 2016 at 2:09 PM, Junio C Hamano wrote: > On Mon, Aug 29, 2016 at 2:03 PM, Uma Srinivasan > wrote: >> On Mon, Aug 29, 2016 at 1:03 PM, Junio C Hamano wrote: >>> >>> A top-level superproject can have a submodule bound at its "dir/" >>> directory, and "dir/.git" can either be a gitfile which you can read >>> with read_gitfile() and point into somewhere in ".git/modules/" of >>> the top-level superproject. "dir/.git" can _ALSO_ be a fully valid >>> Git directory. So at the top of a superproject, you could do >>> >>> git clone $URL ./dir2 >>> git add dir2 >>> >>> to clone an independent project into dir2 directory, and add it as a >>> new submodule. The fallback is to support such a layout. >>> >> Thanks for the reply. However, in this case >> >> git clone $URL ./dir2 >> git add dir2 >> >> how will "dir2" get ever get registered as a submodule? > > With a separate invocation of "git config -f .gitmodules", of course. > The layout to use gitfile to point into .git/modules/ is a more recent > invention than the submodule support itself that "git add" knows about. > The code needs to support both layout as well as it can, and that > is what the "can we read it as gitfile? If not, that directory itself may > be a git repository" codepath you asked about is doing.
Re: git submodules implementation question
Thanks for the reply. However, in this case git clone $URL ./dir2 git add dir2 how will "dir2" get ever get registered as a submodule? I don't see how one can reach the "is_submodule_modified" routine for the scenario above. My understanding is that a sub-directory can be registered as a submodule only with the "git submodule add " command. In this case only a gitlink file is created within the sub-directory and not a .git subdirectory. Please correct me if I am wrong. Thanks again, Uma On Mon, Aug 29, 2016 at 1:03 PM, Junio C Hamano wrote: > Uma Srinivasan writes: > >> git_dir = read_gitfile(buf.buf); >> if (!git_dir) >> >> git_dir = buf.buf; >> >> Can anyone explain to me why we are replacing a failed reading of a >> git file with the original sub directory name? > > A top-level superproject can have a submodule bound at its "dir/" > directory, and "dir/.git" can either be a gitfile which you can read > with read_gitfile() and point into somewhere in ".git/modules/" of > the top-level superproject. "dir/.git" can _ALSO_ be a fully valid > Git directory. So at the top of a superproject, you could do > > git clone $URL ./dir2 > git add dir2 > > to clone an independent project into dir2 directory, and add it as a > new submodule. The fallback is to support such a layout. >
Re: git submodules implementation question
On Mon, Aug 29, 2016 at 2:03 PM, Uma Srinivasan wrote: > On Mon, Aug 29, 2016 at 1:03 PM, Junio C Hamano wrote: >> >> A top-level superproject can have a submodule bound at its "dir/" >> directory, and "dir/.git" can either be a gitfile which you can read >> with read_gitfile() and point into somewhere in ".git/modules/" of >> the top-level superproject. "dir/.git" can _ALSO_ be a fully valid >> Git directory. So at the top of a superproject, you could do >> >> git clone $URL ./dir2 >> git add dir2 >> >> to clone an independent project into dir2 directory, and add it as a >> new submodule. The fallback is to support such a layout. >> > Thanks for the reply. However, in this case > > git clone $URL ./dir2 > git add dir2 > > how will "dir2" get ever get registered as a submodule? With a separate invocation of "git config -f .gitmodules", of course. The layout to use gitfile to point into .git/modules/ is a more recent invention than the submodule support itself that "git add" knows about. The code needs to support both layout as well as it can, and that is what the "can we read it as gitfile? If not, that directory itself may be a git repository" codepath you asked about is doing.
Re: git submodules implementation question
Uma Srinivasan writes: > git_dir = read_gitfile(buf.buf); > if (!git_dir) > > git_dir = buf.buf; > > Can anyone explain to me why we are replacing a failed reading of a > git file with the original sub directory name? A top-level superproject can have a submodule bound at its "dir/" directory, and "dir/.git" can either be a gitfile which you can read with read_gitfile() and point into somewhere in ".git/modules/" of the top-level superproject. "dir/.git" can _ALSO_ be a fully valid Git directory. So at the top of a superproject, you could do git clone $URL ./dir2 git add dir2 to clone an independent project into dir2 directory, and add it as a new submodule. The fallback is to support such a layout.
git submodules implementation question
Hi, I am new to git source and internal development. Recently I've been looking at a problem where issuing "git status" in a corrupted workspace handed over to me by a user, forks several thousand child processes recursively. The symptoms of the corrupted workspace to reproduce this problem are as follows: there needs to be an entry in the index file that looks something like this 0 groc 16 0 0.0 Also, there's a subdirectory under the source dir with the name in the index file above (i.e 'groc') that has a partially populated .git sub directory within it, not a .git file with the contents providing a link to the module subdirectory. The "git submodule status" command returns "No submodule mapping found in .gitmodules for path 'groc'" Doing a "git read-tree HEAD" will regenerate the index file and make the problem go away. Basically the line entry above relating to 'groc' goes away as it is not a submodule in the source repo. This problem can be reproduced with all current versions of GIT including 2.4, 2.8 and the latest version v2.10.0-rc1. In looking into the git source code for is_submodule_modified() which forks off the new child process, I see the following lines in submodule.c: git_dir = read_gitfile(buf.buf); if (!git_dir) git_dir = buf.buf; Can anyone explain to me why we are replacing a failed reading of a git file with the original sub directory name? When can we expect a .git directory under a submodule? Thanks, Uma -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html