Re: [PATCH 3/2] batch check whether submodule needs pushing into one call
On Fri, Sep 16, 2016 at 11:13:09AM -0700, Junio C Hamano wrote: > Heiko Voigtwrites: > > The most exact solution would be to use all actual remote refs available > > (not sure if we have them at this point in the process?) another > > solution would be to still append the --remotes= option as a > > fallback to reduce the revisions. > > I'd say --remotes= is the least problematic thing to do. Ok then lets drop my last patch and keep it the way it was. Because if the remote sha1 differs we probably do not have it locally anyway. The only case this does not catch is when the user specifies a remote URL. But that just means we will iterate over all revisions instead of a reduced set, which makes the check slower but still correct. As one can see from my measurements that should not be that bad anymore. Cheers Heiko
Re: [PATCH 3/2] batch check whether submodule needs pushing into one call
On Fri, Sep 16, 2016 at 10:59:37AM -0700, Junio C Hamano wrote: > Heiko Voigtwrites: > > +static void check_has_hash(const unsigned char sha1[20], void *data) > > +{ > > + int *has_hash = (int *) data; > > + > > + if (!lookup_commit_reference(sha1)) > > + *has_hash = 0; > > +} > > + > > +static int submodule_has_hashes(const char *path, struct sha1_array > > *hashes) > > +{ > > + int has_hash = 1; > > + > > + if (add_submodule_odb(path)) > > + return 0; > > + > > + sha1_array_for_each_unique(hashes, check_has_hash, _hash); > > + return has_hash; > > +} > > + > > +static int submodule_needs_pushing(const char *path, struct sha1_array > > *hashes) > > +{ > > + if (!submodule_has_hashes(path, hashes)) > > return 0; > > I think you meant well, but this optimization is wrong. A mere > presence of an object does not mean that the current tip can reach > that object. Imagine you pushed commit A earlier to them at the > tip, then pushed commit A~ to them at the tip, which is the current > state of the remote of the submodule, and since them they may have > GC'ed. They no longer have the commit A. > > For that matter, because you are doing this check by pretending as > if all the submodule objects are in the object store of the current > superproject you are working in, and saying "it exists there in the > submodule repository" when the only thing you know is it exists in > an object store of either the submodule repository, the superproject > repository, or any of the other submodule repositories, you really > cannot tell much from a mere presence of an object. Not just the > remote of the submodule repository you are interested in, but the > submodule repository you are interested in itself, may not have that > object. > > Drop the previous two helper functions and this short-cut. This is nothing I added. This came from the original code which I simply extended to check for all sha1's. The diff is a little bit misleading. This would be the local diff: - if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) + if (!submodule_has_hashes(path, hashes)) return 0; I think that this is more a shortcut for: If we can not find the sha1, we do not need to bother spawning a process for the real check. We default to the answer no in that case. It looks like the reason for the sha1 check is to avoid error output from the spawned 'rev-list' process. The error output might be confusing to the user in case the sha1 does not exist in the submodule. We should probably die here since we were not able to check a submodule that has a diff in the revisions we would push. After thinking about this further: I think it is actually bad to proceed here, as the current revisions (just in the superproject) would still be pushed and the user possibly gets an inconsistent state on the remote side which he tried to avoid with check/on-demand-push enabled. So in short this deserves some further love. Will have a look into adding another patch for this if we agree on my suggestion to die above. Cheers Heiko
Re: [PATCH 3/2] batch check whether submodule needs pushing into one call
Heiko Voigtwrites: > On Fri, Sep 16, 2016 at 11:40:19AM +0200, Heiko Voigt wrote: >> > By the way, with the two new patches, 'pu' seems to start failing >> > some tests, e.g. 5533 5404 5405. >> >> Ah ok I did only test on master, will look into those. > > Ok I had a look into these and the reason t5533 fails is because on pu > --recurse-submodules is enabled by default and I missed the case when > overwriting a ref. In that case we get the sha1 from the remote side as > old. So we could catch that and fall back to all revisions there, but... > > ... tl;dr: The solution to use the old revisions from the remote side > was too simple and does not make matters better but actually worse for > some typical usecases. Its only in the last patch. You may not even have the old one in your copy of the remote repository if you haven't fetched from them and you are forcing your push. "rev-list --not " may fail in such a case, not producing the list of new commits. You'd need to exclude old ones you learned over the wire that you do not yet have locally. > The most exact solution would be to use all actual remote refs available > (not sure if we have them at this point in the process?) another > solution would be to still append the --remotes= option as a > fallback to reduce the revisions. I'd say --remotes= is the least problematic thing to do.
Re: [PATCH 3/2] batch check whether submodule needs pushing into one call
Heiko Voigtwrites: > +static void append_hash_to_argv(const unsigned char sha1[20], void *data) > { > - if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) > + struct argv_array *argv = (struct argv_array *) data; > + argv_array_push(argv, sha1_to_hex(sha1)); > +} Hmph, why do I think I've seen this before in the previous patch? ... scans through this patch and finds that a similar one is removed ;-) OK. This makes sense. > +static void check_has_hash(const unsigned char sha1[20], void *data) > +{ > + int *has_hash = (int *) data; > + > + if (!lookup_commit_reference(sha1)) > + *has_hash = 0; > +} > + > +static int submodule_has_hashes(const char *path, struct sha1_array *hashes) > +{ > + int has_hash = 1; > + > + if (add_submodule_odb(path)) > + return 0; > + > + sha1_array_for_each_unique(hashes, check_has_hash, _hash); > + return has_hash; > +} > + > +static int submodule_needs_pushing(const char *path, struct sha1_array > *hashes) > +{ > + if (!submodule_has_hashes(path, hashes)) > return 0; I think you meant well, but this optimization is wrong. A mere presence of an object does not mean that the current tip can reach that object. Imagine you pushed commit A earlier to them at the tip, then pushed commit A~ to them at the tip, which is the current state of the remote of the submodule, and since them they may have GC'ed. They no longer have the commit A. For that matter, because you are doing this check by pretending as if all the submodule objects are in the object store of the current superproject you are working in, and saying "it exists there in the submodule repository" when the only thing you know is it exists in an object store of either the submodule repository, the superproject repository, or any of the other submodule repositories, you really cannot tell much from a mere presence of an object. Not just the remote of the submodule repository you are interested in, but the submodule repository you are interested in itself, may not have that object. Drop the previous two helper functions and this short-cut. > if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { > struct child_process cp = CHILD_PROCESS_INIT; > - const char *argv[] = {"rev-list", NULL, "--not", "--remotes", > "-n", "1" , NULL}; > + > + argv_array_push(, "rev-list"); > + sha1_array_for_each_unique(hashes, append_hash_to_argv, > ); > + argv_array_pushl(, "--not", "--remotes", "-n", "1" , > NULL); > + > struct strbuf buf = STRBUF_INIT; > int needs_pushing = 0; > > - argv[1] = sha1_to_hex(sha1); > - cp.argv = argv; > prepare_submodule_repo_env(_array); > cp.git_cmd = 1; > cp.no_stdin = 1; > cp.out = -1; > cp.dir = path; > if (start_command()) > - die("Could not run 'git rev-list %s --not --remotes -n > 1' command in submodule %s", > - sha1_to_hex(sha1), path); > + die("Could not run 'git rev-list --not > --remotes -n 1' command in submodule %s", > + path); > if (strbuf_read(, cp.out, 41)) > needs_pushing = 1; > finish_command(); > @@ -601,21 +628,6 @@ static void find_unpushed_submodule_commits(struct > commit *commit, > diff_tree_combined_merge(commit, 1, ); > } Good. This is the optimization I alluded to in the review of the first one in the series.
Re: [PATCH 3/2] batch check whether submodule needs pushing into one call
On Fri, Sep 16, 2016 at 11:40:19AM +0200, Heiko Voigt wrote: > > By the way, with the two new patches, 'pu' seems to start failing > > some tests, e.g. 5533 5404 5405. > > Ah ok I did only test on master, will look into those. Ok I had a look into these and the reason t5533 fails is because on pu --recurse-submodules is enabled by default and I missed the case when overwriting a ref. In that case we get the sha1 from the remote side as old. So we could catch that and fall back to all revisions there, but... ... tl;dr: The solution to use the old revisions from the remote side was too simple and does not make matters better but actually worse for some typical usecases. Its only in the last patch. ... that lead me to further thinking about the previous solution (using the locally cached remote refs) which might actually be a good default for the non-fastforward cases like creating new ref or overwriting a ref. My current patch would handle the --mirror case nicer, since it gets a lot of old revs to reduce the revisions to check. For the typical one branch push it would most likely be worse. Except when the user is updating (fast-forwarding) the remote ref we would scan all revs of a ref until the root because we do not get enough valid revs that already exist on the remote. The most exact solution would be to use all actual remote refs available (not sure if we have them at this point in the process?) another solution would be to still append the --remotes= option as a fallback to reduce the revisions. What do others think? Will leave this for a little bit further thinking. Its just the last patch ("use actual start hashes for submodule push check instead of local refs") that needs to go back to the drawing board. Cheers Heiko
Re: [PATCH 3/2] batch check whether submodule needs pushing into one call
On Thu, Sep 15, 2016 at 02:08:58PM -0700, Junio C Hamano wrote: > Heiko Voigtwrites: > > > if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { > > struct child_process cp = CHILD_PROCESS_INIT; > > - const char *argv[] = {"rev-list", NULL, "--not", "--remotes", > > "-n", "1" , NULL}; > > + > > + argv_array_push(, "rev-list"); > > + sha1_array_for_each_unique(hashes, append_hash_to_argv, > > ); > > + argv_array_pushl(, "--not", "--remotes", "-n", "1" , > > NULL); > > + > > struct strbuf buf = STRBUF_INIT; > > int needs_pushing = 0; > > These two become decl-after-stmt; move your new lines a bit lower, > perhaps? Thanks, missed those. Will do. > > - argv[1] = sha1_to_hex(sha1); > > - cp.argv = argv; > > prepare_submodule_repo_env(_array); > > By the way, with the two new patches, 'pu' seems to start failing > some tests, e.g. 5533 5404 5405. Ah ok I did only test on master, will look into those. Cheers Heiko
Re: [PATCH 3/2] batch check whether submodule needs pushing into one call
Heiko Voigtwrites: > if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { > struct child_process cp = CHILD_PROCESS_INIT; > - const char *argv[] = {"rev-list", NULL, "--not", "--remotes", > "-n", "1" , NULL}; > + > + argv_array_push(, "rev-list"); > + sha1_array_for_each_unique(hashes, append_hash_to_argv, > ); > + argv_array_pushl(, "--not", "--remotes", "-n", "1" , > NULL); > + > struct strbuf buf = STRBUF_INIT; > int needs_pushing = 0; These two become decl-after-stmt; move your new lines a bit lower, perhaps? > - argv[1] = sha1_to_hex(sha1); > - cp.argv = argv; > prepare_submodule_repo_env(_array); By the way, with the two new patches, 'pu' seems to start failing some tests, e.g. 5533 5404 5405.
[PATCH 3/2] batch check whether submodule needs pushing into one call
We run a command for each sha1 change in a submodule. This is unnecessary since we can simply batch all sha1's we want to check into one command. Lets do it so we can speedup the check when many submodule changes are in need of checking. Signed-off-by: Heiko Voigt--- On Wed, Sep 14, 2016 at 03:30:53PM -0700, Junio C Hamano wrote: > Heiko Voigt writes: > > > Sorry about the late reply. I was not able to process emails until now. > > Here are two patches that should help to improve the situation and batch > > up some processing. This one is for repositories with submodules, so > > that they do not iterate over the same submodule twice with the same > > hash. > > > > The second one will be the one people without submodules are interested > > in. > > Thanks. Will take a look at later as I'm already deep in today's > integration cycle. Very much appreciated. No problem. While I am at it: Here are actually another two patches that should make life of submodule users easier (push times of big pushes). In Numbers with the qt5[1] superproject and all submodules initialized. The same --mirror test as before with the git repository: # Without patch: book:qt5 hvoigt (5.6)$ rm -rf ~/Downloads/git-test && mkdir ~/Downloads/git-test && (cd ~/Downloads/git-test && git init) && time git push --mirror --recurse-submodules=check ~/Downloads/git-test real4m0.881s user3m30.139s sys 0m22.329s Without --recurse-submodules=check real0m0.251s user0m0.218s sys 0m0.082s # With patch: real0m1.167s user0m0.846s sys 0m0.262s real0m1.110s user0m0.815s sys 0m0.247s real0m1.111s user0m0.818s sys 0m0.251s Without --recurse-submodules=check real0m0.294s user0m0.221s sys 0m0.104s real0m0.248s user0m0.216s sys 0m0.080s real0m0.247s user0m0.212s sys 0m0.082s [1] git://code.qt.io/qt/qt5.git submodule.c | 75 - 1 file changed, 39 insertions(+), 36 deletions(-) diff --git a/submodule.c b/submodule.c index a15e346..28bb74e 100644 --- a/submodule.c +++ b/submodule.c @@ -522,27 +522,54 @@ static int has_remote(const char *refname, const struct object_id *oid, return 1; } -static int submodule_needs_pushing(const char *path, const unsigned char sha1[20]) +static void append_hash_to_argv(const unsigned char sha1[20], void *data) { - if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) + struct argv_array *argv = (struct argv_array *) data; + argv_array_push(argv, sha1_to_hex(sha1)); +} + +static void check_has_hash(const unsigned char sha1[20], void *data) +{ + int *has_hash = (int *) data; + + if (!lookup_commit_reference(sha1)) + *has_hash = 0; +} + +static int submodule_has_hashes(const char *path, struct sha1_array *hashes) +{ + int has_hash = 1; + + if (add_submodule_odb(path)) + return 0; + + sha1_array_for_each_unique(hashes, check_has_hash, _hash); + return has_hash; +} + +static int submodule_needs_pushing(const char *path, struct sha1_array *hashes) +{ + if (!submodule_has_hashes(path, hashes)) return 0; if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { struct child_process cp = CHILD_PROCESS_INIT; - const char *argv[] = {"rev-list", NULL, "--not", "--remotes", "-n", "1" , NULL}; + + argv_array_push(, "rev-list"); + sha1_array_for_each_unique(hashes, append_hash_to_argv, ); + argv_array_pushl(, "--not", "--remotes", "-n", "1" , NULL); + struct strbuf buf = STRBUF_INIT; int needs_pushing = 0; - argv[1] = sha1_to_hex(sha1); - cp.argv = argv; prepare_submodule_repo_env(_array); cp.git_cmd = 1; cp.no_stdin = 1; cp.out = -1; cp.dir = path; if (start_command()) - die("Could not run 'git rev-list %s --not --remotes -n 1' command in submodule %s", - sha1_to_hex(sha1), path); + die("Could not run 'git rev-list --not --remotes -n 1' command in submodule %s", + path); if (strbuf_read(, cp.out, 41)) needs_pushing = 1; finish_command(); @@ -601,21 +628,6 @@ static void find_unpushed_submodule_commits(struct commit *commit, diff_tree_combined_merge(commit, 1, ); } -struct collect_submodule_from_sha1s_data { - char *submodule_path; - struct string_list *needs_pushing; -}; - -static void collect_submodules_from_sha1s(const unsigned char sha1[20], - void *data) -{ - struct collect_submodule_from_sha1s_data *me = - (struct