Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call

2016-10-13 Thread Heiko Voigt
On Wed, Oct 12, 2016 at 10:37:33AM -0700, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> >> If we do not even have these commits locally, then there is no point
> >> attempting to push, so returning 0 (i.e. it is not "needs pushing"
> >> situation) is correct but it is a but subtle.  It's not "we know
> >> they already have them", but it is "even if we tried to push, it
> >> won't do us or the other side any good."  A single-liner in-code
> >> comment may help.
> >
> > First the naming part. How about:
> >
> > submodule_has_commits()
> 
> Nice.

Ok will use that. And while I am at it: I will also rename all the
'hashes' variables to commits because that makes the code way clearer I
think.

> > Returning 0 here means: "No push needed" but the correct answer would
> > be: "We do not know". 
> 
> Is it?  Perhaps I am misreading the "submodule-has-commits"; I
> thought it was "the remote may or may not need updating, but we
> ourselves don't have what they may need to have commits in their
> submodule that are referenced by their superproject, so it would not
> help them even if we pushed our submodule to them".  It indeed is
> different from "No push needed" (rather, "our pushing would be
> pointless").

Yes you could also rephrase/see it that way. But the question is: If we
do not have what the remote needs would the user expect us to tell him
that fact and stop or does he usually not care?

> > So how about:
> >
> >
> > if (!submodule_has_hashes(path, hashes))
> > /* NEEDSWORK: The correct answer here is "We do not
> >  * know" instead of "No". We currently proceed pushing
> >  * here as if the submodules commits are available on a
> >  * remote, which is not always correct. */
> > return 0;
> 
> I am not sure.  
> 
> What should happen in this scenario?
> 
>  * We have two remotes, A and B, for our superproject.
> 
>  * We are not interested in one submodule at path X.  Our repository
>is primarily used to work on the superproject and possibly other
>submodules but not the one at path X.
> 
>  * We pulled from A to update ourselves.  They were actively working
>on the submodule we are not interested in, and path X in the
>superproject records a new commit that we do not have.
> 
>  * We are now trying to push to B.

I am not sure if this is a typical scenario? Well, if you are updating
your main branch from someone else and then push it to your own fork
maybe. You could specify --no-recurse-submodules for this case though.
The proper solution for this case would probably be something along the
lines of 'submodule..fetchRecurseSubmodules' but for push so we
can mark certain submodules as uninteresting by default.

I like to be more protective to the user here. Its usually more
annoying for possibly many others when you push out things that have
missing things compared to one person not being able to push because his
submodule is not up-to-date/initialized.

> Should different things happen in these two subcases?
> 
>  - We are not interested in submodule at path X, so we haven't even
>done "submodule init" on it.
> 
>  - We are not interested in submodule at path X, so even though we
>do have a rather stale clone of it, we do not usually bother
>updating what is checked out at path X and commit our changes
>outside that area.
> 
> I tend to think that in these two cases the same thing should
> happen.  I am not sure if that same thing should be rejection
> (i.e. "you do not know for sure that the commit at path X of the
> superproject you are pushing exists in the submodule repository at
> the receiving end, so I'd refuse to push the superproject"), as it
> makes the only remedy for the situation is for you to make a full
> clone of the submodule you are not interested in and you have never
> touched yourself in either of these two subcases.

I also think in both situations the same thing should happen. A decision
that something different should happen should be made explicitly instead
of implicitly just because some submodule is not initialized. That might
be by accident or because a certain submodule is new so here the choice
should be made deliberately by the user, IMO.

Cheers Heiko


Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call

2016-10-12 Thread Junio C Hamano
Heiko Voigt  writes:

>> If we do not even have these commits locally, then there is no point
>> attempting to push, so returning 0 (i.e. it is not "needs pushing"
>> situation) is correct but it is a but subtle.  It's not "we know
>> they already have them", but it is "even if we tried to push, it
>> won't do us or the other side any good."  A single-liner in-code
>> comment may help.
>
> First the naming part. How about:
>
>   submodule_has_commits()

Nice.

> Returning 0 here means: "No push needed" but the correct answer would
> be: "We do not know". 

Is it?  Perhaps I am misreading the "submodule-has-commits"; I
thought it was "the remote may or may not need updating, but we
ourselves don't have what they may need to have commits in their
submodule that are referenced by their superproject, so it would not
help them even if we pushed our submodule to them".  It indeed is
different from "No push needed" (rather, "our pushing would be
pointless").

> So how about:
>
>
>   if (!submodule_has_hashes(path, hashes))
>   /* NEEDSWORK: The correct answer here is "We do not
>* know" instead of "No". We currently proceed pushing
>* here as if the submodules commits are available on a
>* remote, which is not always correct. */
>   return 0;

I am not sure.  

What should happen in this scenario?

 * We have two remotes, A and B, for our superproject.

 * We are not interested in one submodule at path X.  Our repository
   is primarily used to work on the superproject and possibly other
   submodules but not the one at path X.

 * We pulled from A to update ourselves.  They were actively working
   on the submodule we are not interested in, and path X in the
   superproject records a new commit that we do not have.

 * We are now trying to push to B.

Should different things happen in these two subcases?

 - We are not interested in submodule at path X, so we haven't even
   done "submodule init" on it.

 - We are not interested in submodule at path X, so even though we
   do have a rather stale clone of it, we do not usually bother
   updating what is checked out at path X and commit our changes
   outside that area.

I tend to think that in these two cases the same thing should
happen.  I am not sure if that same thing should be rejection
(i.e. "you do not know for sure that the commit at path X of the
superproject you are pushing exists in the submodule repository at
the receiving end, so I'd refuse to push the superproject"), as it
makes the only remedy for the situation is for you to make a full
clone of the submodule you are not interested in and you have never
touched yourself in either of these two subcases.




Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call

2016-10-12 Thread Heiko Voigt
On Mon, Oct 10, 2016 at 03:56:13PM -0700, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> > -static int submodule_needs_pushing(const char *path, const unsigned char 
> > sha1[20])
> > +static int check_has_hash(const unsigned char sha1[20], void *data)
> >  {
> > -   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
> > +   int *has_hash = (int *) data;
> > +
> > +   if (!lookup_commit_reference(sha1))
> > +   *has_hash = 0;
> > +
> > +   return 0;
> > +}
> > +
> > +static int submodule_has_hashes(const char *path, struct sha1_array 
> > *hashes)
> > +{
> > +   int has_hash = 1;
> > +
> > +   if (add_submodule_odb(path))
> > +   return 0;
> > +
> > +   sha1_array_for_each_unique(hashes, check_has_hash, _hash);
> > +   return has_hash;
> > +}
> > +
> > +static int submodule_needs_pushing(const char *path, struct sha1_array 
> > *hashes)
> > +{
> > +   if (!submodule_has_hashes(path, hashes))
> > return 0;
> 
> Same comment about naming.  
> 
> What do check-has-hash and submodule-has-hashes exactly mean by
> "hash" in their names?  Because I think what is checked here is
> "does the local submodule repository have _all_ the commits
> referenced from the superproject commit we are pushing?", so I'd
> prefer to see "commit" in their names.
> 
> If we do not even have these commits locally, then there is no point
> attempting to push, so returning 0 (i.e. it is not "needs pushing"
> situation) is correct but it is a but subtle.  It's not "we know
> they already have them", but it is "even if we tried to push, it
> won't do us or the other side any good."  A single-liner in-code
> comment may help.

First the naming part. How about:

submodule_has_commits()

?

Second as mentioned a previous answer[1] to this part: I would actually
like to have a die() here instead of blindly proceeding. Since the user
either specified --recurse-submodules=... at the commandline or it was
implicitly enabled because we have submodules in the tree we should be
careful and not push revisions referencing submodules that are not
available at a remote. If we can not properly figure it out I would
suggest to stop and tell the user how to solve the situation. E.g.
either she clones the appropriate submodules or specifies
--no-recurse-submodules on the commandline to tell git that she does not
care.

Returning 0 here means: "No push needed" but the correct answer would
be: "We do not know". Question is what we should do here which I am
planning to address in a separate patch series since that will be
changing behavior.

So how about:


if (!submodule_has_hashes(path, hashes))
/* NEEDSWORK: The correct answer here is "We do not
 * know" instead of "No". We currently proceed pushing
 * here as if the submodules commits are available on a
 * remote, which is not always correct. */
return 0;

What do you think?

Cheers Heiko

[1] http://public-inbox.org/git/20160919195812.gc62...@book.hvoigt.net/


Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call

2016-10-10 Thread Junio C Hamano
Heiko Voigt  writes:

> -static int submodule_needs_pushing(const char *path, const unsigned char 
> sha1[20])
> +static int check_has_hash(const unsigned char sha1[20], void *data)
>  {
> - if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
> + int *has_hash = (int *) data;
> +
> + if (!lookup_commit_reference(sha1))
> + *has_hash = 0;
> +
> + return 0;
> +}
> +
> +static int submodule_has_hashes(const char *path, struct sha1_array *hashes)
> +{
> + int has_hash = 1;
> +
> + if (add_submodule_odb(path))
> + return 0;
> +
> + sha1_array_for_each_unique(hashes, check_has_hash, _hash);
> + return has_hash;
> +}
> +
> +static int submodule_needs_pushing(const char *path, struct sha1_array 
> *hashes)
> +{
> + if (!submodule_has_hashes(path, hashes))
>   return 0;

Same comment about naming.  

What do check-has-hash and submodule-has-hashes exactly mean by
"hash" in their names?  Because I think what is checked here is
"does the local submodule repository have _all_ the commits
referenced from the superproject commit we are pushing?", so I'd
prefer to see "commit" in their names.

If we do not even have these commits locally, then there is no point
attempting to push, so returning 0 (i.e. it is not "needs pushing"
situation) is correct but it is a but subtle.  It's not "we know
they already have them", but it is "even if we tried to push, it
won't do us or the other side any good."  A single-liner in-code
comment may help.

Thanks.


Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call

2016-10-07 Thread Stefan Beller
On Fri, Oct 7, 2016 at 8:06 AM, Heiko Voigt  wrote:
> We run a command for each sha1 change in a submodule. This is
> unnecessary since we can simply batch all sha1's we want to check into
> one command. Lets do it so we can speedup the check when many submodule
> changes are in need of checking.
>
> Signed-off-by: Heiko Voigt 
> ---
>  submodule.c | 63 
> +
>  1 file changed, 34 insertions(+), 29 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 5044afc2f8..a05c2a34b1 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -529,27 +529,49 @@ static int append_hash_to_argv(const unsigned char 
> sha1[20], void *data)
> return 0;
>  }
>
> -static int submodule_needs_pushing(const char *path, const unsigned char 
> sha1[20])
> +static int check_has_hash(const unsigned char sha1[20], void *data)
>  {
> -   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
> +   int *has_hash = (int *) data;
> +
> +   if (!lookup_commit_reference(sha1))
> +   *has_hash = 0;
> +
> +   return 0;
> +}
> +
> +static int submodule_has_hashes(const char *path, struct sha1_array *hashes)
> +{
> +   int has_hash = 1;
> +
> +   if (add_submodule_odb(path))
> +   return 0;
> +
> +   sha1_array_for_each_unique(hashes, check_has_hash, _hash);
> +   return has_hash;
> +}
> +
> +static int submodule_needs_pushing(const char *path, struct sha1_array 
> *hashes)
> +{
> +   if (!submodule_has_hashes(path, hashes))

So the above is an implicit lookup already, but we did that before,
too, so it's fine.

> @@ -658,13 +665,11 @@ int find_unpushed_submodules(struct sha1_array *hashes,
> argv_array_clear();
>
> for (i = 0; i < submodules.nr; i++) {
> -   struct string_list_item *item = [i];
> -   struct collect_submodule_from_sha1s_data data;
> -   data.submodule_path = item->string;
> -   data.needs_pushing = needs_pushing;
> -   sha1_array_for_each_unique((struct sha1_array *) item->util,
> -   collect_submodules_from_sha1s,
> -   );
> +   struct string_list_item *submodule = [i];
> +   struct sha1_array *hashes = (struct sha1_array *) 
> submodule->util;
> +
> +   if (submodule_needs_pushing(submodule->string, hashes))
> +   string_list_insert(needs_pushing, submodule->string);

That makes sense.

Thanks!
Stefan


[PATCH v2 3/3] batch check whether submodule needs pushing into one call

2016-10-07 Thread Heiko Voigt
We run a command for each sha1 change in a submodule. This is
unnecessary since we can simply batch all sha1's we want to check into
one command. Lets do it so we can speedup the check when many submodule
changes are in need of checking.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 63 +
 1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/submodule.c b/submodule.c
index 5044afc2f8..a05c2a34b1 100644
--- a/submodule.c
+++ b/submodule.c
@@ -529,27 +529,49 @@ static int append_hash_to_argv(const unsigned char 
sha1[20], void *data)
return 0;
 }
 
-static int submodule_needs_pushing(const char *path, const unsigned char 
sha1[20])
+static int check_has_hash(const unsigned char sha1[20], void *data)
 {
-   if (add_submodule_odb(path) || !lookup_commit_reference(sha1))
+   int *has_hash = (int *) data;
+
+   if (!lookup_commit_reference(sha1))
+   *has_hash = 0;
+
+   return 0;
+}
+
+static int submodule_has_hashes(const char *path, struct sha1_array *hashes)
+{
+   int has_hash = 1;
+
+   if (add_submodule_odb(path))
+   return 0;
+
+   sha1_array_for_each_unique(hashes, check_has_hash, _hash);
+   return has_hash;
+}
+
+static int submodule_needs_pushing(const char *path, struct sha1_array *hashes)
+{
+   if (!submodule_has_hashes(path, hashes))
return 0;
 
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
-   const char *argv[] = {"rev-list", NULL, "--not", "--remotes", 
"-n", "1" , NULL};
struct strbuf buf = STRBUF_INIT;
int needs_pushing = 0;
 
-   argv[1] = sha1_to_hex(sha1);
-   cp.argv = argv;
+   argv_array_push(, "rev-list");
+   sha1_array_for_each_unique(hashes, append_hash_to_argv, 
);
+   argv_array_pushl(, "--not", "--remotes", "-n", "1" , 
NULL);
+
prepare_submodule_repo_env(_array);
cp.git_cmd = 1;
cp.no_stdin = 1;
cp.out = -1;
cp.dir = path;
if (start_command())
-   die("Could not run 'git rev-list %s --not --remotes -n 
1' command in submodule %s",
-   sha1_to_hex(sha1), path);
+   die("Could not run 'git rev-list  --not 
--remotes -n 1' command in submodule %s",
+   path);
if (strbuf_read(, cp.out, 41))
needs_pushing = 1;
finish_command();
@@ -604,21 +626,6 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, );
 }
 
-struct collect_submodule_from_sha1s_data {
-   char *submodule_path;
-   struct string_list *needs_pushing;
-};
-
-static void collect_submodules_from_sha1s(const unsigned char sha1[20],
-   void *data)
-{
-   struct collect_submodule_from_sha1s_data *me =
-   (struct collect_submodule_from_sha1s_data *) data;
-
-   if (submodule_needs_pushing(me->submodule_path, sha1))
-   string_list_insert(me->needs_pushing, me->submodule_path);
-}
-
 static void free_submodules_sha1s(struct string_list *submodules)
 {
int i;
@@ -658,13 +665,11 @@ int find_unpushed_submodules(struct sha1_array *hashes,
argv_array_clear();
 
for (i = 0; i < submodules.nr; i++) {
-   struct string_list_item *item = [i];
-   struct collect_submodule_from_sha1s_data data;
-   data.submodule_path = item->string;
-   data.needs_pushing = needs_pushing;
-   sha1_array_for_each_unique((struct sha1_array *) item->util,
-   collect_submodules_from_sha1s,
-   );
+   struct string_list_item *submodule = [i];
+   struct sha1_array *hashes = (struct sha1_array *) 
submodule->util;
+
+   if (submodule_needs_pushing(submodule->string, hashes))
+   string_list_insert(needs_pushing, submodule->string);
}
free_submodules_sha1s();
 
-- 
2.10.1.637.g09b28c5