Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call
On Wed, Oct 12, 2016 at 10:37:33AM -0700, Junio C Hamano wrote: > Heiko Voigtwrites: > > >> If we do not even have these commits locally, then there is no point > >> attempting to push, so returning 0 (i.e. it is not "needs pushing" > >> situation) is correct but it is a but subtle. It's not "we know > >> they already have them", but it is "even if we tried to push, it > >> won't do us or the other side any good." A single-liner in-code > >> comment may help. > > > > First the naming part. How about: > > > > submodule_has_commits() > > Nice. Ok will use that. And while I am at it: I will also rename all the 'hashes' variables to commits because that makes the code way clearer I think. > > Returning 0 here means: "No push needed" but the correct answer would > > be: "We do not know". > > Is it? Perhaps I am misreading the "submodule-has-commits"; I > thought it was "the remote may or may not need updating, but we > ourselves don't have what they may need to have commits in their > submodule that are referenced by their superproject, so it would not > help them even if we pushed our submodule to them". It indeed is > different from "No push needed" (rather, "our pushing would be > pointless"). Yes you could also rephrase/see it that way. But the question is: If we do not have what the remote needs would the user expect us to tell him that fact and stop or does he usually not care? > > So how about: > > > > > > if (!submodule_has_hashes(path, hashes)) > > /* NEEDSWORK: The correct answer here is "We do not > > * know" instead of "No". We currently proceed pushing > > * here as if the submodules commits are available on a > > * remote, which is not always correct. */ > > return 0; > > I am not sure. > > What should happen in this scenario? > > * We have two remotes, A and B, for our superproject. > > * We are not interested in one submodule at path X. Our repository >is primarily used to work on the superproject and possibly other >submodules but not the one at path X. > > * We pulled from A to update ourselves. They were actively working >on the submodule we are not interested in, and path X in the >superproject records a new commit that we do not have. > > * We are now trying to push to B. I am not sure if this is a typical scenario? Well, if you are updating your main branch from someone else and then push it to your own fork maybe. You could specify --no-recurse-submodules for this case though. The proper solution for this case would probably be something along the lines of 'submodule..fetchRecurseSubmodules' but for push so we can mark certain submodules as uninteresting by default. I like to be more protective to the user here. Its usually more annoying for possibly many others when you push out things that have missing things compared to one person not being able to push because his submodule is not up-to-date/initialized. > Should different things happen in these two subcases? > > - We are not interested in submodule at path X, so we haven't even >done "submodule init" on it. > > - We are not interested in submodule at path X, so even though we >do have a rather stale clone of it, we do not usually bother >updating what is checked out at path X and commit our changes >outside that area. > > I tend to think that in these two cases the same thing should > happen. I am not sure if that same thing should be rejection > (i.e. "you do not know for sure that the commit at path X of the > superproject you are pushing exists in the submodule repository at > the receiving end, so I'd refuse to push the superproject"), as it > makes the only remedy for the situation is for you to make a full > clone of the submodule you are not interested in and you have never > touched yourself in either of these two subcases. I also think in both situations the same thing should happen. A decision that something different should happen should be made explicitly instead of implicitly just because some submodule is not initialized. That might be by accident or because a certain submodule is new so here the choice should be made deliberately by the user, IMO. Cheers Heiko
Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call
Heiko Voigtwrites: >> If we do not even have these commits locally, then there is no point >> attempting to push, so returning 0 (i.e. it is not "needs pushing" >> situation) is correct but it is a but subtle. It's not "we know >> they already have them", but it is "even if we tried to push, it >> won't do us or the other side any good." A single-liner in-code >> comment may help. > > First the naming part. How about: > > submodule_has_commits() Nice. > Returning 0 here means: "No push needed" but the correct answer would > be: "We do not know". Is it? Perhaps I am misreading the "submodule-has-commits"; I thought it was "the remote may or may not need updating, but we ourselves don't have what they may need to have commits in their submodule that are referenced by their superproject, so it would not help them even if we pushed our submodule to them". It indeed is different from "No push needed" (rather, "our pushing would be pointless"). > So how about: > > > if (!submodule_has_hashes(path, hashes)) > /* NEEDSWORK: The correct answer here is "We do not >* know" instead of "No". We currently proceed pushing >* here as if the submodules commits are available on a >* remote, which is not always correct. */ > return 0; I am not sure. What should happen in this scenario? * We have two remotes, A and B, for our superproject. * We are not interested in one submodule at path X. Our repository is primarily used to work on the superproject and possibly other submodules but not the one at path X. * We pulled from A to update ourselves. They were actively working on the submodule we are not interested in, and path X in the superproject records a new commit that we do not have. * We are now trying to push to B. Should different things happen in these two subcases? - We are not interested in submodule at path X, so we haven't even done "submodule init" on it. - We are not interested in submodule at path X, so even though we do have a rather stale clone of it, we do not usually bother updating what is checked out at path X and commit our changes outside that area. I tend to think that in these two cases the same thing should happen. I am not sure if that same thing should be rejection (i.e. "you do not know for sure that the commit at path X of the superproject you are pushing exists in the submodule repository at the receiving end, so I'd refuse to push the superproject"), as it makes the only remedy for the situation is for you to make a full clone of the submodule you are not interested in and you have never touched yourself in either of these two subcases.
Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call
On Mon, Oct 10, 2016 at 03:56:13PM -0700, Junio C Hamano wrote: > Heiko Voigtwrites: > > > -static int submodule_needs_pushing(const char *path, const unsigned char > > sha1[20]) > > +static int check_has_hash(const unsigned char sha1[20], void *data) > > { > > - if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) > > + int *has_hash = (int *) data; > > + > > + if (!lookup_commit_reference(sha1)) > > + *has_hash = 0; > > + > > + return 0; > > +} > > + > > +static int submodule_has_hashes(const char *path, struct sha1_array > > *hashes) > > +{ > > + int has_hash = 1; > > + > > + if (add_submodule_odb(path)) > > + return 0; > > + > > + sha1_array_for_each_unique(hashes, check_has_hash, _hash); > > + return has_hash; > > +} > > + > > +static int submodule_needs_pushing(const char *path, struct sha1_array > > *hashes) > > +{ > > + if (!submodule_has_hashes(path, hashes)) > > return 0; > > Same comment about naming. > > What do check-has-hash and submodule-has-hashes exactly mean by > "hash" in their names? Because I think what is checked here is > "does the local submodule repository have _all_ the commits > referenced from the superproject commit we are pushing?", so I'd > prefer to see "commit" in their names. > > If we do not even have these commits locally, then there is no point > attempting to push, so returning 0 (i.e. it is not "needs pushing" > situation) is correct but it is a but subtle. It's not "we know > they already have them", but it is "even if we tried to push, it > won't do us or the other side any good." A single-liner in-code > comment may help. First the naming part. How about: submodule_has_commits() ? Second as mentioned a previous answer[1] to this part: I would actually like to have a die() here instead of blindly proceeding. Since the user either specified --recurse-submodules=... at the commandline or it was implicitly enabled because we have submodules in the tree we should be careful and not push revisions referencing submodules that are not available at a remote. If we can not properly figure it out I would suggest to stop and tell the user how to solve the situation. E.g. either she clones the appropriate submodules or specifies --no-recurse-submodules on the commandline to tell git that she does not care. Returning 0 here means: "No push needed" but the correct answer would be: "We do not know". Question is what we should do here which I am planning to address in a separate patch series since that will be changing behavior. So how about: if (!submodule_has_hashes(path, hashes)) /* NEEDSWORK: The correct answer here is "We do not * know" instead of "No". We currently proceed pushing * here as if the submodules commits are available on a * remote, which is not always correct. */ return 0; What do you think? Cheers Heiko [1] http://public-inbox.org/git/20160919195812.gc62...@book.hvoigt.net/
Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call
Heiko Voigtwrites: > -static int submodule_needs_pushing(const char *path, const unsigned char > sha1[20]) > +static int check_has_hash(const unsigned char sha1[20], void *data) > { > - if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) > + int *has_hash = (int *) data; > + > + if (!lookup_commit_reference(sha1)) > + *has_hash = 0; > + > + return 0; > +} > + > +static int submodule_has_hashes(const char *path, struct sha1_array *hashes) > +{ > + int has_hash = 1; > + > + if (add_submodule_odb(path)) > + return 0; > + > + sha1_array_for_each_unique(hashes, check_has_hash, _hash); > + return has_hash; > +} > + > +static int submodule_needs_pushing(const char *path, struct sha1_array > *hashes) > +{ > + if (!submodule_has_hashes(path, hashes)) > return 0; Same comment about naming. What do check-has-hash and submodule-has-hashes exactly mean by "hash" in their names? Because I think what is checked here is "does the local submodule repository have _all_ the commits referenced from the superproject commit we are pushing?", so I'd prefer to see "commit" in their names. If we do not even have these commits locally, then there is no point attempting to push, so returning 0 (i.e. it is not "needs pushing" situation) is correct but it is a but subtle. It's not "we know they already have them", but it is "even if we tried to push, it won't do us or the other side any good." A single-liner in-code comment may help. Thanks.
Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call
On Fri, Oct 7, 2016 at 8:06 AM, Heiko Voigtwrote: > We run a command for each sha1 change in a submodule. This is > unnecessary since we can simply batch all sha1's we want to check into > one command. Lets do it so we can speedup the check when many submodule > changes are in need of checking. > > Signed-off-by: Heiko Voigt > --- > submodule.c | 63 > + > 1 file changed, 34 insertions(+), 29 deletions(-) > > diff --git a/submodule.c b/submodule.c > index 5044afc2f8..a05c2a34b1 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -529,27 +529,49 @@ static int append_hash_to_argv(const unsigned char > sha1[20], void *data) > return 0; > } > > -static int submodule_needs_pushing(const char *path, const unsigned char > sha1[20]) > +static int check_has_hash(const unsigned char sha1[20], void *data) > { > - if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) > + int *has_hash = (int *) data; > + > + if (!lookup_commit_reference(sha1)) > + *has_hash = 0; > + > + return 0; > +} > + > +static int submodule_has_hashes(const char *path, struct sha1_array *hashes) > +{ > + int has_hash = 1; > + > + if (add_submodule_odb(path)) > + return 0; > + > + sha1_array_for_each_unique(hashes, check_has_hash, _hash); > + return has_hash; > +} > + > +static int submodule_needs_pushing(const char *path, struct sha1_array > *hashes) > +{ > + if (!submodule_has_hashes(path, hashes)) So the above is an implicit lookup already, but we did that before, too, so it's fine. > @@ -658,13 +665,11 @@ int find_unpushed_submodules(struct sha1_array *hashes, > argv_array_clear(); > > for (i = 0; i < submodules.nr; i++) { > - struct string_list_item *item = [i]; > - struct collect_submodule_from_sha1s_data data; > - data.submodule_path = item->string; > - data.needs_pushing = needs_pushing; > - sha1_array_for_each_unique((struct sha1_array *) item->util, > - collect_submodules_from_sha1s, > - ); > + struct string_list_item *submodule = [i]; > + struct sha1_array *hashes = (struct sha1_array *) > submodule->util; > + > + if (submodule_needs_pushing(submodule->string, hashes)) > + string_list_insert(needs_pushing, submodule->string); That makes sense. Thanks! Stefan
[PATCH v2 3/3] batch check whether submodule needs pushing into one call
We run a command for each sha1 change in a submodule. This is unnecessary since we can simply batch all sha1's we want to check into one command. Lets do it so we can speedup the check when many submodule changes are in need of checking. Signed-off-by: Heiko Voigt--- submodule.c | 63 + 1 file changed, 34 insertions(+), 29 deletions(-) diff --git a/submodule.c b/submodule.c index 5044afc2f8..a05c2a34b1 100644 --- a/submodule.c +++ b/submodule.c @@ -529,27 +529,49 @@ static int append_hash_to_argv(const unsigned char sha1[20], void *data) return 0; } -static int submodule_needs_pushing(const char *path, const unsigned char sha1[20]) +static int check_has_hash(const unsigned char sha1[20], void *data) { - if (add_submodule_odb(path) || !lookup_commit_reference(sha1)) + int *has_hash = (int *) data; + + if (!lookup_commit_reference(sha1)) + *has_hash = 0; + + return 0; +} + +static int submodule_has_hashes(const char *path, struct sha1_array *hashes) +{ + int has_hash = 1; + + if (add_submodule_odb(path)) + return 0; + + sha1_array_for_each_unique(hashes, check_has_hash, _hash); + return has_hash; +} + +static int submodule_needs_pushing(const char *path, struct sha1_array *hashes) +{ + if (!submodule_has_hashes(path, hashes)) return 0; if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { struct child_process cp = CHILD_PROCESS_INIT; - const char *argv[] = {"rev-list", NULL, "--not", "--remotes", "-n", "1" , NULL}; struct strbuf buf = STRBUF_INIT; int needs_pushing = 0; - argv[1] = sha1_to_hex(sha1); - cp.argv = argv; + argv_array_push(, "rev-list"); + sha1_array_for_each_unique(hashes, append_hash_to_argv, ); + argv_array_pushl(, "--not", "--remotes", "-n", "1" , NULL); + prepare_submodule_repo_env(_array); cp.git_cmd = 1; cp.no_stdin = 1; cp.out = -1; cp.dir = path; if (start_command()) - die("Could not run 'git rev-list %s --not --remotes -n 1' command in submodule %s", - sha1_to_hex(sha1), path); + die("Could not run 'git rev-list --not --remotes -n 1' command in submodule %s", + path); if (strbuf_read(, cp.out, 41)) needs_pushing = 1; finish_command(); @@ -604,21 +626,6 @@ static void find_unpushed_submodule_commits(struct commit *commit, diff_tree_combined_merge(commit, 1, ); } -struct collect_submodule_from_sha1s_data { - char *submodule_path; - struct string_list *needs_pushing; -}; - -static void collect_submodules_from_sha1s(const unsigned char sha1[20], - void *data) -{ - struct collect_submodule_from_sha1s_data *me = - (struct collect_submodule_from_sha1s_data *) data; - - if (submodule_needs_pushing(me->submodule_path, sha1)) - string_list_insert(me->needs_pushing, me->submodule_path); -} - static void free_submodules_sha1s(struct string_list *submodules) { int i; @@ -658,13 +665,11 @@ int find_unpushed_submodules(struct sha1_array *hashes, argv_array_clear(); for (i = 0; i < submodules.nr; i++) { - struct string_list_item *item = [i]; - struct collect_submodule_from_sha1s_data data; - data.submodule_path = item->string; - data.needs_pushing = needs_pushing; - sha1_array_for_each_unique((struct sha1_array *) item->util, - collect_submodules_from_sha1s, - ); + struct string_list_item *submodule = [i]; + struct sha1_array *hashes = (struct sha1_array *) submodule->util; + + if (submodule_needs_pushing(submodule->string, hashes)) + string_list_insert(needs_pushing, submodule->string); } free_submodules_sha1s(); -- 2.10.1.637.g09b28c5