Re: [PATCHv2 0/8] Expose the submodule parallelism to the user

2015-11-03 Thread Stefan Beller
On Thu, Oct 29, 2015 at 4:50 PM, Ramsay Jones
 wrote:
>
>
> On 29/10/15 15:51, Stefan Beller wrote:
>> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
>>  wrote:
>>
>>> Hmm, is there a way to _not_ fetch in parallel (override the
>>> config) from the command line for a given command?
>>>
>>> ATB,
>>> Ramsay Jones
>>
>> git config submodule.jobs 42
>> git  --jobs 1 # should run just one task, despite having 42 configured
>
> Heh, yes ... I didn't pose the question quite right ...
>>
>> It does use the parallel processing machinery though, but with a maximum of
>> one subcommand being spawned. Is that what you're asking?
>
> ... but, despite that, you correctly inferred what I was really
> asking about! :)
>
> I was just wondering what overhead the parallel processing machinery
> adds to the original 'non-parallel' code path (for the j=1 case).
> I suspect the answer is 'not much', but that's just a guess.
> Have you measured it?

Totally unscientific:
 * Make a copy of my current gerrit repository and time the fetch.
 * That repo contains 5 submodules, one needs fetching

time git fetch --recurse-submodules=yes --jobs=1 # this series
real 0m7.150s
user 0m3.459s
sys 0m1.126s

time git fetch --recurse-submodules=yes # origin/master
real 0m7.667s
user 0m3.439s
sys 0m1.190s

Now let's test a few more times repeatedly to avoid cold caches or
network hiccups, (also there is nothing to fetch, so it's more like doing
6 ls-remotes in a row, one for gerrit and 5 submodules)

this series, best out of 5:
real 0m3.971s
user 0m2.447s
sys 0m0.452s

this series, worst out of 5:
real 0m4.229s
user 0m2.506s
sys 0m0.413s

origin/master, best out of 5:
real 0m3.968s
user 0m2.516s
sys 0m0.380s

origin/master, worst out of 5:
real 0m4.217s
user 0m2.472s
sys 0m0.408s

The ratio of real time taken longer is < 1 % in
both the best and worst case.

If you really care about 1 % of performance, you'd want to fetch in
parallel anyway?


> What happens if there is only a single
> submodule to fetch?

Ok let's see. I created https://github.com/stefanbeller/test-sub-1
to play around with it. However
time git fetch --recurse-submodules=yes
or
time git fetch --recurse-submodules=yes --jobs 100
seems to be lost in the noise.

So I am not sure what the question is w.r.t. having just one
submodule.


>
> ATB,
> Ramsay Jones
>
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 0/8] Expose the submodule parallelism to the user

2015-10-29 Thread Ramsay Jones


On 28/10/15 23:21, Stefan Beller wrote:
> This replaces origin/sb/submodule-parallel-update
> (anchoring at 74367d8938, Merge branch 'sb/submodule-parallel-fetch'
> into sb/submodule-parallel-update)
> 
> What does it do?
> ---
> This series should finish the on going efforts of parallelizing
> submodule network traffic. The patches contain tests for clone,
> fetch and submodule update to use the actual parallelism both via
> command line as well as a configured option. I decided to go with
> "submodule.jobs" for all three for now.
> 
> What is new in v2?
> ---
> * The patches got reordered slightly
> * Documentation was adapted
> 
> Interdiff below
> 
> Stefan Beller (8):
>   run_processes_parallel: Add output to tracing messages
>   submodule config: keep update strategy around
>   submodule config: remove name_and_item_from_var
>   submodule-config: parse_config
>   fetching submodules: Respect `submodule.jobs` config option
>   git submodule update: have a dedicated helper for cloning
>   submodule update: expose parallelism to the user
>   clone: allow an explicit argument for parallel submodule clones
> 
>  Documentation/config.txt|   7 ++
>  Documentation/git-clone.txt |   6 +-
>  Documentation/git-submodule.txt |   7 +-
>  builtin/clone.c |  23 +++-
>  builtin/fetch.c |   2 +-
>  builtin/submodule--helper.c | 244 
> 
>  git-submodule.sh|  54 -
>  run-command.c   |   4 +
>  submodule-config.c  |  98 ++--
>  submodule-config.h  |   3 +
>  submodule.c |   5 +
>  t/t5526-fetch-submodules.sh |  14 +++
>  t/t7400-submodule-basic.sh  |   4 +-
>  t/t7406-submodule-update.sh |  27 +
>  14 files changed, 418 insertions(+), 80 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0de0138..785721a 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2643,12 +2643,12 @@ submodule..ignore::
>   "--ignore-submodules" option. The 'git submodule' commands are not
>   affected by this setting.
>  
> -submodule::jobs
> +submodule.jobs::
>   This is used to determine how many submodules can be operated on in
>   parallel. Specifying a positive integer allows up to that number
> - of submodules being fetched in parallel. Specifying 0 the number
> - of cpus will be taken as the maximum number. Currently this is
> - used in fetch and clone operations only.
> + of submodules being fetched in parallel. This is used in fetch
> + and clone operations only. A value of 0 will give some reasonable
> + default. The defaults may change with different versions of Git.
>  
>  tag.sort::
>   This variable controls the sort ordering of tags when displayed by
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index affa52e..01bd6b7 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -216,9 +216,10 @@ objects from the source repository into a pack in the 
> cloned repository.
>   The result is Git repository can be separated from working
>   tree.
>  
> --j::
> ---jobs::
> +-j ::
> +--jobs ::
>   The number of submodules fetched at the same time.
> + Defaults to the `submodule.jobs` option.

Hmm, is there a way to _not_ fetch in parallel (override the
config) from the command line for a given command?

ATB,
Ramsay Jones

>  
>  ::
>   The (possibly remote) repository to clone from.  See the
> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
> index f5429fa..c70fafd 100644
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -374,10 +374,11 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` 
> options carefully.
>   clone with a history truncated to the specified number of revisions.
>   See linkgit:git-clone[1]
>  
> --j::
> ---jobs::
> +-j ::
> +--jobs ::
>   This option is only valid for the update command.
>   Clone new submodules in parallel with as many jobs.
> + Defaults to the `submodule.jobs` option.
>  
>  ...::
>   Paths to submodule(s). When specified this will restrict the command
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5ac2d89..22b9924 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -727,10 +727,7 @@ static int checkout(void)
>   struct argv_array args = ARGV_ARRAY_INIT;
>   argv_array_pushl(, "submodule", "update", "--init", 
> "--recursive", NULL);
>  
> - if (max_jobs == -1)
> - if (git_config_get_int("submodule.jobs", _jobs))
> - max_jobs = 1;
> - if (max_jobs != 1) {
> + if (max_jobs != -1) {
>   struct strbuf sb = STRBUF_INIT;
>   strbuf_addf(, "--jobs=%d", max_jobs);
>

Re: [PATCHv2 0/8] Expose the submodule parallelism to the user

2015-10-29 Thread Stefan Beller
On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
 wrote:

> Hmm, is there a way to _not_ fetch in parallel (override the
> config) from the command line for a given command?
>
> ATB,
> Ramsay Jones

git config submodule.jobs 42
git  --jobs 1 # should run just one task, despite having 42 configured

It does use the parallel processing machinery though, but with a maximum of
one subcommand being spawned. Is that what you're asking?

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 0/8] Expose the submodule parallelism to the user

2015-10-29 Thread Stefan Beller
On Thu, Oct 29, 2015 at 10:23 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
>>  wrote:
>>
>>> Hmm, is there a way to _not_ fetch in parallel (override the
>>> config) from the command line for a given command?
>>>
>>> ATB,
>>> Ramsay Jones
>>
>> git config submodule.jobs 42
>> git  --jobs 1 # should run just one task, despite having 42 configured
>>
>> It does use the parallel processing machinery though, but with a maximum of
>> one subcommand being spawned. Is that what you're asking?
>
> With this patch, do we still keep a separate machinery that bypasses
> the parallel thing altogether in the first place?

No.

>
> I was hoping that the underlying parallel machinery is polished
> enough that using it with max=1 parallelism would be equivalent to
> serial execution.

There is no special code path for jobs=1.

It should be pretty close, just with the overhead of the parallel engine
spawning it one after the other and being an intermediate for output piping.
The one subcommand would still output via a pipe to the parallel engine,
which then outputs it immediately.

> At least, that was my understanding of our goal,
> and back when we reviewed the previous "fetch --recurse-sub" series,
> my impression was we were already there.
>
> And in that ideal endgame world, your "Give '-j1' from the command
> line" would be perfectly an acceptable answer ;-).

ok. :)

>
> Thanks.
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 0/8] Expose the submodule parallelism to the user

2015-10-29 Thread Junio C Hamano
Stefan Beller  writes:

> This replaces origin/sb/submodule-parallel-update
> (anchoring at 74367d8938, Merge branch 'sb/submodule-parallel-fetch'
> into sb/submodule-parallel-update)
>
> What does it do?
> ---
> This series should finish the on going efforts of parallelizing
> submodule network traffic. The patches contain tests for clone,
> fetch and submodule update to use the actual parallelism both via
> command line as well as a configured option. I decided to go with
> "submodule.jobs" for all three for now.
>
> What is new in v2?
> ---
> * The patches got reordered slightly
> * Documentation was adapted

A couple of things I noticed (other than "many issues pointed out in
v1 have been updated") are:

 - The way 7/8 and 8/8 checks for uninitialized max_jobs are
   inconsistently written.  The way 7/8 does, i.e. (max_jobs < 0),
   looks more conventional.

 - "Defaults to the `submodule.jobs` option" should say
   "configuration variable" instead.

I haven't formed an opinion on 6/8 yet.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 0/8] Expose the submodule parallelism to the user

2015-10-29 Thread Junio C Hamano
Stefan Beller  writes:

> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
>  wrote:
>
>> Hmm, is there a way to _not_ fetch in parallel (override the
>> config) from the command line for a given command?
>>
>> ATB,
>> Ramsay Jones
>
> git config submodule.jobs 42
> git  --jobs 1 # should run just one task, despite having 42 configured
>
> It does use the parallel processing machinery though, but with a maximum of
> one subcommand being spawned. Is that what you're asking?

With this patch, do we still keep a separate machinery that bypasses
the parallel thing altogether in the first place?

I was hoping that the underlying parallel machinery is polished
enough that using it with max=1 parallelism would be equivalent to
serial execution.  At least, that was my understanding of our goal,
and back when we reviewed the previous "fetch --recurse-sub" series,
my impression was we were already there.

And in that ideal endgame world, your "Give '-j1' from the command
line" would be perfectly an acceptable answer ;-).

Thanks.
 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 0/8] Expose the submodule parallelism to the user

2015-10-29 Thread Ramsay Jones


On 29/10/15 15:51, Stefan Beller wrote:
> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
>  wrote:
> 
>> Hmm, is there a way to _not_ fetch in parallel (override the
>> config) from the command line for a given command?
>>
>> ATB,
>> Ramsay Jones
> 
> git config submodule.jobs 42
> git  --jobs 1 # should run just one task, despite having 42 configured

Heh, yes ... I didn't pose the question quite right ...
> 
> It does use the parallel processing machinery though, but with a maximum of
> one subcommand being spawned. Is that what you're asking?

... but, despite that, you correctly inferred what I was really
asking about! :)

I was just wondering what overhead the parallel processing machinery
adds to the original 'non-parallel' code path (for the j=1 case).
I suspect the answer is 'not much', but that's just a guess.
Have you measured it? What happens if there is only a single
submodule to fetch?

ATB,
Ramsay Jones


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 0/8] Expose the submodule parallelism to the user

2015-10-28 Thread Stefan Beller
This replaces origin/sb/submodule-parallel-update
(anchoring at 74367d8938, Merge branch 'sb/submodule-parallel-fetch'
into sb/submodule-parallel-update)

What does it do?
---
This series should finish the on going efforts of parallelizing
submodule network traffic. The patches contain tests for clone,
fetch and submodule update to use the actual parallelism both via
command line as well as a configured option. I decided to go with
"submodule.jobs" for all three for now.

What is new in v2?
---
* The patches got reordered slightly
* Documentation was adapted

Interdiff below

Stefan Beller (8):
  run_processes_parallel: Add output to tracing messages
  submodule config: keep update strategy around
  submodule config: remove name_and_item_from_var
  submodule-config: parse_config
  fetching submodules: Respect `submodule.jobs` config option
  git submodule update: have a dedicated helper for cloning
  submodule update: expose parallelism to the user
  clone: allow an explicit argument for parallel submodule clones

 Documentation/config.txt|   7 ++
 Documentation/git-clone.txt |   6 +-
 Documentation/git-submodule.txt |   7 +-
 builtin/clone.c |  23 +++-
 builtin/fetch.c |   2 +-
 builtin/submodule--helper.c | 244 
 git-submodule.sh|  54 -
 run-command.c   |   4 +
 submodule-config.c  |  98 ++--
 submodule-config.h  |   3 +
 submodule.c |   5 +
 t/t5526-fetch-submodules.sh |  14 +++
 t/t7400-submodule-basic.sh  |   4 +-
 t/t7406-submodule-update.sh |  27 +
 14 files changed, 418 insertions(+), 80 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0de0138..785721a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2643,12 +2643,12 @@ submodule..ignore::
"--ignore-submodules" option. The 'git submodule' commands are not
affected by this setting.
 
-submodule::jobs
+submodule.jobs::
This is used to determine how many submodules can be operated on in
parallel. Specifying a positive integer allows up to that number
-   of submodules being fetched in parallel. Specifying 0 the number
-   of cpus will be taken as the maximum number. Currently this is
-   used in fetch and clone operations only.
+   of submodules being fetched in parallel. This is used in fetch
+   and clone operations only. A value of 0 will give some reasonable
+   default. The defaults may change with different versions of Git.
 
 tag.sort::
This variable controls the sort ordering of tags when displayed by
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index affa52e..01bd6b7 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -216,9 +216,10 @@ objects from the source repository into a pack in the 
cloned repository.
The result is Git repository can be separated from working
tree.
 
--j::
---jobs::
+-j ::
+--jobs ::
The number of submodules fetched at the same time.
+   Defaults to the `submodule.jobs` option.
 
 ::
The (possibly remote) repository to clone from.  See the
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index f5429fa..c70fafd 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -374,10 +374,11 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` 
options carefully.
clone with a history truncated to the specified number of revisions.
See linkgit:git-clone[1]
 
--j::
---jobs::
+-j ::
+--jobs ::
This option is only valid for the update command.
Clone new submodules in parallel with as many jobs.
+   Defaults to the `submodule.jobs` option.
 
 ...::
Paths to submodule(s). When specified this will restrict the command
diff --git a/builtin/clone.c b/builtin/clone.c
index 5ac2d89..22b9924 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -727,10 +727,7 @@ static int checkout(void)
struct argv_array args = ARGV_ARRAY_INIT;
argv_array_pushl(, "submodule", "update", "--init", 
"--recursive", NULL);
 
-   if (max_jobs == -1)
-   if (git_config_get_int("submodule.jobs", _jobs))
-   max_jobs = 1;
-   if (max_jobs != 1) {
+   if (max_jobs != -1) {
struct strbuf sb = STRBUF_INIT;
strbuf_addf(, "--jobs=%d", max_jobs);
argv_array_push(, sb.buf);
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c3d438a..67dba1c 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -476,9 +476,10 @@ static int update_clone(int argc, const char **argv, const 
char *prefix)
/* Overlay the parsed .gitmodules