Re: [PATCH v2 1/3] serialize collection of changed submodules

2016-10-13 Thread Heiko Voigt
On Wed, Oct 12, 2016 at 10:18:28AM -0700, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> > Which seems quite extensively long for a static function so how about
> > we shorten it a bit and add a comment:
> >
> > /* lookup or create commit object list for submodule */
> > get_commit_objects_for_submodule_path(...
> 
> Or you can even lose "get_" and "path", I guess.  You are not even
> "getting" commits but the array that holds them, so the caller can
> use it to "get" one of them or it can even use it to "put" a new
> one, no?  "get-commit-objects" is a misnomer in that sense.  Either
> one of
> 
> get_submodule_commits_array()
> submodule_commits()
> 
> perhaps?  I dunno.

I like the last one. Will use 'submodule_commits()'.

Cheers Heiko


Re: [PATCH v2 1/3] serialize collection of changed submodules

2016-10-12 Thread Junio C Hamano
Heiko Voigt  writes:

> Which seems quite extensively long for a static function so how about
> we shorten it a bit and add a comment:
>
>   /* lookup or create commit object list for submodule */
>   get_commit_objects_for_submodule_path(...

Or you can even lose "get_" and "path", I guess.  You are not even
"getting" commits but the array that holds them, so the caller can
use it to "get" one of them or it can even use it to "put" a new
one, no?  "get-commit-objects" is a misnomer in that sense.  Either
one of

get_submodule_commits_array()
submodule_commits()

perhaps?  I dunno.


Re: [PATCH v2 1/3] serialize collection of changed submodules

2016-10-12 Thread Heiko Voigt
On Mon, Oct 10, 2016 at 03:43:13PM -0700, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> >> +static struct sha1_array *get_sha1s_from_list(struct string_list 
> >> *submodules,
> >> +   const char *path)
> >
> > So this will take the stringlist `submodules` and insert the path into it,
> > if it wasn't already in there. In case it is newly inserted, add a 
> > sha1_array
> > as util, so each inserted path has it's own empty array.
> >
> > So it is both init of the data structures as well as retrieving them. I was
> > initially confused by the name as I assumed it would give you sha1s out
> > of a string list (e.g. transform strings to internal sha1 things).
> > Maybe it's just
> > me having a hard time to understand that, but I feel like the name could be
> > improved.
> >
> > lookup_sha1_list_by_path,
> > insert_path_and_return_sha1_list ?
> 
> I do not think either the name or the "find if exists otherwise
> initialize one" behaviour is particularly confusing, but I do not
> think "maintain a set of sha1_arrays keyed with a string" is a so
> widely reusable general concept/construct.  As can be seen easily in
> the names of parameters, this function is about maintaining a set of
> sha1_arrays keyed by paths to submodules, and I also assume that the
> array indexed by path is not meant to be a general purpose "we can
> use it to store any 40-hex thing" but to store something specific.
> 
> What is that specific thing?  The names of commit objects in the
> submodule repository?
> 
> I'd prefer to see that exact thing used to construct the function
> name for a helper function with specific usage in mind, i.e.
> get_commit_object_names_for_submodule_path() or something along that
> line.

I did not name this function too precisely to keep it's name short since
everything specific was quite long, like the suggestion from Junio.

Since this is a static function local to the submodule file I was
assuming anyone interested would just look up the usage and immediately
see the purpose. If I look into submodule-cache.c where I have a similar
functionality we used 'lookup_or_create' for this create on demand
functionality. So a function name would be:

lookup_or_create_commit_objects_for_submodule_path(...

Which seems quite extensively long for a static function so how about
we shorten it a bit and add a comment:

/* lookup or create commit object list for submodule */
get_commit_objects_for_submodule_path(...

?

Cheers Heiko


Re: [PATCH v2 1/3] serialize collection of changed submodules

2016-10-12 Thread Heiko Voigt
On Fri, Oct 07, 2016 at 10:59:29AM -0700, Stefan Beller wrote:
> On Fri, Oct 7, 2016 at 8:06 AM, Heiko Voigt  wrote:
> > +static void free_submodules_sha1s(struct string_list *submodules)
> > +{
> > +   int i;
> > +   for (i = 0; i < submodules->nr; i++) {
> > +   struct string_list_item *item = >items[i];
> 
> You do not seem to make use of `i` explicitely, so
> for_each_string_list_item might be more readable here?

Will change.

> > @@ -603,12 +645,23 @@ int find_unpushed_submodules(unsigned char 
> > new_sha1[20],
> > die("revision walk setup failed");
> >
> > while ((commit = get_revision()) != NULL)
> > -   find_unpushed_submodule_commits(commit, needs_pushing);
> > +   find_unpushed_submodule_commits(commit, );
> >
> > reset_revision_walk();
> > free(sha1_copy);
> > strbuf_release(_arg);
> >
> > +   for (i = 0; i < submodules.nr; i++) {
> > +   struct string_list_item *item = [i];
> 
> You do not seem to make use of `i` explicitely, so
> for_each_string_list_item might be more readable here?

As above.

Cheers Heiko


Re: [PATCH v2 1/3] serialize collection of changed submodules

2016-10-10 Thread Junio C Hamano
Stefan Beller  writes:

>> +static struct sha1_array *get_sha1s_from_list(struct string_list 
>> *submodules,
>> +   const char *path)
>
> So this will take the stringlist `submodules` and insert the path into it,
> if it wasn't already in there. In case it is newly inserted, add a sha1_array
> as util, so each inserted path has it's own empty array.
>
> So it is both init of the data structures as well as retrieving them. I was
> initially confused by the name as I assumed it would give you sha1s out
> of a string list (e.g. transform strings to internal sha1 things).
> Maybe it's just
> me having a hard time to understand that, but I feel like the name could be
> improved.
>
> lookup_sha1_list_by_path,
> insert_path_and_return_sha1_list ?

I do not think either the name or the "find if exists otherwise
initialize one" behaviour is particularly confusing, but I do not
think "maintain a set of sha1_arrays keyed with a string" is a so
widely reusable general concept/construct.  As can be seen easily in
the names of parameters, this function is about maintaining a set of
sha1_arrays keyed by paths to submodules, and I also assume that the
array indexed by path is not meant to be a general purpose "we can
use it to store any 40-hex thing" but to store something specific.

What is that specific thing?  The names of commit objects in the
submodule repository?

I'd prefer to see that exact thing used to construct the function
name for a helper function with specific usage in mind, i.e.
get_commit_object_names_for_submodule_path() or something along that
line.


Re: [PATCH v2 1/3] serialize collection of changed submodules

2016-10-07 Thread Stefan Beller
On Fri, Oct 7, 2016 at 8:06 AM, Heiko Voigt  wrote:
> To check whether a submodule needs to be pushed we need to collect all
> changed submodules. Lets collect them first and then execute the
> possibly expensive test whether certain revisions are already pushed
> only once per submodule.
>
> There is further potential for optimization since we can assemble one
> command and only issued that instead of one call for each remote ref in
> the submodule.
>
> Signed-off-by: Heiko Voigt 
> ---
>  submodule.c | 63 
> -
>  1 file changed, 58 insertions(+), 5 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 2de06a3351..59c9d15905 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -554,19 +554,34 @@ static int submodule_needs_pushing(const char *path, 
> const unsigned char sha1[20
> return 0;
>  }
>
> +static struct sha1_array *get_sha1s_from_list(struct string_list *submodules,
> +   const char *path)

So this will take the stringlist `submodules` and insert the path into it,
if it wasn't already in there. In case it is newly inserted, add a sha1_array
as util, so each inserted path has it's own empty array.

So it is both init of the data structures as well as retrieving them. I was
initially confused by the name as I assumed it would give you sha1s out
of a string list (e.g. transform strings to internal sha1 things).
Maybe it's just
me having a hard time to understand that, but I feel like the name could be
improved.

lookup_sha1_list_by_path,
insert_path_and_return_sha1_list ?

> +{
> +   struct string_list_item *item;
> +
> +   item = string_list_insert(submodules, path);
> +   if (item->util)
> +   return (struct sha1_array *) item->util;
> +
> +   /* NEEDSWORK: should we have sha1_array_init()? */
> +   item->util = xcalloc(1, sizeof(struct sha1_array));
> +   return (struct sha1_array *) item->util;
> +}
> +
>  static void collect_submodules_from_diff(struct diff_queue_struct *q,
>  struct diff_options *options,
>  void *data)
>  {
> int i;
> -   struct string_list *needs_pushing = data;
> +   struct string_list *submodules = data;
>
> for (i = 0; i < q->nr; i++) {
> struct diff_filepair *p = q->queue[i];
> +   struct sha1_array *hashes;
> if (!S_ISGITLINK(p->two->mode))
> continue;
> -   if (submodule_needs_pushing(p->two->path, p->two->oid.hash))
> -   string_list_insert(needs_pushing, p->two->path);
> +   hashes = get_sha1s_from_list(submodules, p->two->path);
> +   sha1_array_append(hashes, p->two->oid.hash);
> }
>  }
>
> @@ -582,14 +597,41 @@ static void find_unpushed_submodule_commits(struct 
> commit *commit,
> diff_tree_combined_merge(commit, 1, );
>  }
>
> +struct collect_submodule_from_sha1s_data {
> +   char *submodule_path;
> +   struct string_list *needs_pushing;
> +};
> +
> +static void collect_submodules_from_sha1s(const unsigned char sha1[20],
> +   void *data)
> +{
> +   struct collect_submodule_from_sha1s_data *me =
> +   (struct collect_submodule_from_sha1s_data *) data;
> +
> +   if (submodule_needs_pushing(me->submodule_path, sha1))
> +   string_list_insert(me->needs_pushing, me->submodule_path);
> +}
> +
> +static void free_submodules_sha1s(struct string_list *submodules)
> +{
> +   int i;
> +   for (i = 0; i < submodules->nr; i++) {
> +   struct string_list_item *item = >items[i];

You do not seem to make use of `i` explicitely, so
for_each_string_list_item might be more readable here?


> +   struct sha1_array *hashes = (struct sha1_array *) item->util;
> +   sha1_array_clear(hashes);
> +   }
> +   string_list_clear(submodules, 1);
> +}
> +
>  int find_unpushed_submodules(unsigned char new_sha1[20],
> const char *remotes_name, struct string_list *needs_pushing)
>  {
> struct rev_info rev;
> struct commit *commit;
> const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
> -   int argc = ARRAY_SIZE(argv) - 1;
> +   int argc = ARRAY_SIZE(argv) - 1, i;
> char *sha1_copy;
> +   struct string_list submodules = STRING_LIST_INIT_DUP;
>
> struct strbuf remotes_arg = STRBUF_INIT;
>
> @@ -603,12 +645,23 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
> die("revision walk setup failed");
>
> while ((commit = get_revision()) != NULL)
> -   find_unpushed_submodule_commits(commit, needs_pushing);
> +   find_unpushed_submodule_commits(commit, );
>
> reset_revision_walk();
> free(sha1_copy);
> strbuf_release(_arg);
>
> +   for 

[PATCH v2 1/3] serialize collection of changed submodules

2016-10-07 Thread Heiko Voigt
To check whether a submodule needs to be pushed we need to collect all
changed submodules. Lets collect them first and then execute the
possibly expensive test whether certain revisions are already pushed
only once per submodule.

There is further potential for optimization since we can assemble one
command and only issued that instead of one call for each remote ref in
the submodule.

Signed-off-by: Heiko Voigt 
---
 submodule.c | 63 -
 1 file changed, 58 insertions(+), 5 deletions(-)

diff --git a/submodule.c b/submodule.c
index 2de06a3351..59c9d15905 100644
--- a/submodule.c
+++ b/submodule.c
@@ -554,19 +554,34 @@ static int submodule_needs_pushing(const char *path, 
const unsigned char sha1[20
return 0;
 }
 
+static struct sha1_array *get_sha1s_from_list(struct string_list *submodules,
+   const char *path)
+{
+   struct string_list_item *item;
+
+   item = string_list_insert(submodules, path);
+   if (item->util)
+   return (struct sha1_array *) item->util;
+
+   /* NEEDSWORK: should we have sha1_array_init()? */
+   item->util = xcalloc(1, sizeof(struct sha1_array));
+   return (struct sha1_array *) item->util;
+}
+
 static void collect_submodules_from_diff(struct diff_queue_struct *q,
 struct diff_options *options,
 void *data)
 {
int i;
-   struct string_list *needs_pushing = data;
+   struct string_list *submodules = data;
 
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
+   struct sha1_array *hashes;
if (!S_ISGITLINK(p->two->mode))
continue;
-   if (submodule_needs_pushing(p->two->path, p->two->oid.hash))
-   string_list_insert(needs_pushing, p->two->path);
+   hashes = get_sha1s_from_list(submodules, p->two->path);
+   sha1_array_append(hashes, p->two->oid.hash);
}
 }
 
@@ -582,14 +597,41 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, );
 }
 
+struct collect_submodule_from_sha1s_data {
+   char *submodule_path;
+   struct string_list *needs_pushing;
+};
+
+static void collect_submodules_from_sha1s(const unsigned char sha1[20],
+   void *data)
+{
+   struct collect_submodule_from_sha1s_data *me =
+   (struct collect_submodule_from_sha1s_data *) data;
+
+   if (submodule_needs_pushing(me->submodule_path, sha1))
+   string_list_insert(me->needs_pushing, me->submodule_path);
+}
+
+static void free_submodules_sha1s(struct string_list *submodules)
+{
+   int i;
+   for (i = 0; i < submodules->nr; i++) {
+   struct string_list_item *item = >items[i];
+   struct sha1_array *hashes = (struct sha1_array *) item->util;
+   sha1_array_clear(hashes);
+   }
+   string_list_clear(submodules, 1);
+}
+
 int find_unpushed_submodules(unsigned char new_sha1[20],
const char *remotes_name, struct string_list *needs_pushing)
 {
struct rev_info rev;
struct commit *commit;
const char *argv[] = {NULL, NULL, "--not", "NULL", NULL};
-   int argc = ARRAY_SIZE(argv) - 1;
+   int argc = ARRAY_SIZE(argv) - 1, i;
char *sha1_copy;
+   struct string_list submodules = STRING_LIST_INIT_DUP;
 
struct strbuf remotes_arg = STRBUF_INIT;
 
@@ -603,12 +645,23 @@ int find_unpushed_submodules(unsigned char new_sha1[20],
die("revision walk setup failed");
 
while ((commit = get_revision()) != NULL)
-   find_unpushed_submodule_commits(commit, needs_pushing);
+   find_unpushed_submodule_commits(commit, );
 
reset_revision_walk();
free(sha1_copy);
strbuf_release(_arg);
 
+   for (i = 0; i < submodules.nr; i++) {
+   struct string_list_item *item = [i];
+   struct collect_submodule_from_sha1s_data data;
+   data.submodule_path = item->string;
+   data.needs_pushing = needs_pushing;
+   sha1_array_for_each_unique((struct sha1_array *) item->util,
+   collect_submodules_from_sha1s,
+   );
+   }
+   free_submodules_sha1s();
+
return needs_pushing->nr;
 }
 
-- 
2.10.1.637.g09b28c5