Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
On Thu, Oct 29, 2015 at 4:50 PM, Ramsay Joneswrote: > > > On 29/10/15 15:51, Stefan Beller wrote: >> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones >> wrote: >> >>> Hmm, is there a way to _not_ fetch in parallel (override the >>> config) from the command line for a given command? >>> >>> ATB, >>> Ramsay Jones >> >> git config submodule.jobs 42 >> git --jobs 1 # should run just one task, despite having 42 configured > > Heh, yes ... I didn't pose the question quite right ... >> >> It does use the parallel processing machinery though, but with a maximum of >> one subcommand being spawned. Is that what you're asking? > > ... but, despite that, you correctly inferred what I was really > asking about! :) > > I was just wondering what overhead the parallel processing machinery > adds to the original 'non-parallel' code path (for the j=1 case). > I suspect the answer is 'not much', but that's just a guess. > Have you measured it? Totally unscientific: * Make a copy of my current gerrit repository and time the fetch. * That repo contains 5 submodules, one needs fetching time git fetch --recurse-submodules=yes --jobs=1 # this series real 0m7.150s user 0m3.459s sys 0m1.126s time git fetch --recurse-submodules=yes # origin/master real 0m7.667s user 0m3.439s sys 0m1.190s Now let's test a few more times repeatedly to avoid cold caches or network hiccups, (also there is nothing to fetch, so it's more like doing 6 ls-remotes in a row, one for gerrit and 5 submodules) this series, best out of 5: real 0m3.971s user 0m2.447s sys 0m0.452s this series, worst out of 5: real 0m4.229s user 0m2.506s sys 0m0.413s origin/master, best out of 5: real 0m3.968s user 0m2.516s sys 0m0.380s origin/master, worst out of 5: real 0m4.217s user 0m2.472s sys 0m0.408s The ratio of real time taken longer is < 1 % in both the best and worst case. If you really care about 1 % of performance, you'd want to fetch in parallel anyway? > What happens if there is only a single > submodule to fetch? Ok let's see. I created https://github.com/stefanbeller/test-sub-1 to play around with it. However time git fetch --recurse-submodules=yes or time git fetch --recurse-submodules=yes --jobs 100 seems to be lost in the noise. So I am not sure what the question is w.r.t. having just one submodule. > > ATB, > Ramsay Jones > > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
On 28/10/15 23:21, Stefan Beller wrote: > This replaces origin/sb/submodule-parallel-update > (anchoring at 74367d8938, Merge branch 'sb/submodule-parallel-fetch' > into sb/submodule-parallel-update) > > What does it do? > --- > This series should finish the on going efforts of parallelizing > submodule network traffic. The patches contain tests for clone, > fetch and submodule update to use the actual parallelism both via > command line as well as a configured option. I decided to go with > "submodule.jobs" for all three for now. > > What is new in v2? > --- > * The patches got reordered slightly > * Documentation was adapted > > Interdiff below > > Stefan Beller (8): > run_processes_parallel: Add output to tracing messages > submodule config: keep update strategy around > submodule config: remove name_and_item_from_var > submodule-config: parse_config > fetching submodules: Respect `submodule.jobs` config option > git submodule update: have a dedicated helper for cloning > submodule update: expose parallelism to the user > clone: allow an explicit argument for parallel submodule clones > > Documentation/config.txt| 7 ++ > Documentation/git-clone.txt | 6 +- > Documentation/git-submodule.txt | 7 +- > builtin/clone.c | 23 +++- > builtin/fetch.c | 2 +- > builtin/submodule--helper.c | 244 > > git-submodule.sh| 54 - > run-command.c | 4 + > submodule-config.c | 98 ++-- > submodule-config.h | 3 + > submodule.c | 5 + > t/t5526-fetch-submodules.sh | 14 +++ > t/t7400-submodule-basic.sh | 4 +- > t/t7406-submodule-update.sh | 27 + > 14 files changed, 418 insertions(+), 80 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 0de0138..785721a 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -2643,12 +2643,12 @@ submodule..ignore:: > "--ignore-submodules" option. The 'git submodule' commands are not > affected by this setting. > > -submodule::jobs > +submodule.jobs:: > This is used to determine how many submodules can be operated on in > parallel. Specifying a positive integer allows up to that number > - of submodules being fetched in parallel. Specifying 0 the number > - of cpus will be taken as the maximum number. Currently this is > - used in fetch and clone operations only. > + of submodules being fetched in parallel. This is used in fetch > + and clone operations only. A value of 0 will give some reasonable > + default. The defaults may change with different versions of Git. > > tag.sort:: > This variable controls the sort ordering of tags when displayed by > diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt > index affa52e..01bd6b7 100644 > --- a/Documentation/git-clone.txt > +++ b/Documentation/git-clone.txt > @@ -216,9 +216,10 @@ objects from the source repository into a pack in the > cloned repository. > The result is Git repository can be separated from working > tree. > > --j:: > ---jobs:: > +-j :: > +--jobs :: > The number of submodules fetched at the same time. > + Defaults to the `submodule.jobs` option. Hmm, is there a way to _not_ fetch in parallel (override the config) from the command line for a given command? ATB, Ramsay Jones > > :: > The (possibly remote) repository to clone from. See the > diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt > index f5429fa..c70fafd 100644 > --- a/Documentation/git-submodule.txt > +++ b/Documentation/git-submodule.txt > @@ -374,10 +374,11 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` > options carefully. > clone with a history truncated to the specified number of revisions. > See linkgit:git-clone[1] > > --j:: > ---jobs:: > +-j :: > +--jobs :: > This option is only valid for the update command. > Clone new submodules in parallel with as many jobs. > + Defaults to the `submodule.jobs` option. > > ...:: > Paths to submodule(s). When specified this will restrict the command > diff --git a/builtin/clone.c b/builtin/clone.c > index 5ac2d89..22b9924 100644 > --- a/builtin/clone.c > +++ b/builtin/clone.c > @@ -727,10 +727,7 @@ static int checkout(void) > struct argv_array args = ARGV_ARRAY_INIT; > argv_array_pushl(, "submodule", "update", "--init", > "--recursive", NULL); > > - if (max_jobs == -1) > - if (git_config_get_int("submodule.jobs", _jobs)) > - max_jobs = 1; > - if (max_jobs != 1) { > + if (max_jobs != -1) { > struct strbuf sb = STRBUF_INIT; > strbuf_addf(, "--jobs=%d", max_jobs); >
Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Joneswrote: > Hmm, is there a way to _not_ fetch in parallel (override the > config) from the command line for a given command? > > ATB, > Ramsay Jones git config submodule.jobs 42 git --jobs 1 # should run just one task, despite having 42 configured It does use the parallel processing machinery though, but with a maximum of one subcommand being spawned. Is that what you're asking? Thanks, Stefan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
On Thu, Oct 29, 2015 at 10:23 AM, Junio C Hamanowrote: > Stefan Beller writes: > >> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones >> wrote: >> >>> Hmm, is there a way to _not_ fetch in parallel (override the >>> config) from the command line for a given command? >>> >>> ATB, >>> Ramsay Jones >> >> git config submodule.jobs 42 >> git --jobs 1 # should run just one task, despite having 42 configured >> >> It does use the parallel processing machinery though, but with a maximum of >> one subcommand being spawned. Is that what you're asking? > > With this patch, do we still keep a separate machinery that bypasses > the parallel thing altogether in the first place? No. > > I was hoping that the underlying parallel machinery is polished > enough that using it with max=1 parallelism would be equivalent to > serial execution. There is no special code path for jobs=1. It should be pretty close, just with the overhead of the parallel engine spawning it one after the other and being an intermediate for output piping. The one subcommand would still output via a pipe to the parallel engine, which then outputs it immediately. > At least, that was my understanding of our goal, > and back when we reviewed the previous "fetch --recurse-sub" series, > my impression was we were already there. > > And in that ideal endgame world, your "Give '-j1' from the command > line" would be perfectly an acceptable answer ;-). ok. :) > > Thanks. > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
Stefan Bellerwrites: > This replaces origin/sb/submodule-parallel-update > (anchoring at 74367d8938, Merge branch 'sb/submodule-parallel-fetch' > into sb/submodule-parallel-update) > > What does it do? > --- > This series should finish the on going efforts of parallelizing > submodule network traffic. The patches contain tests for clone, > fetch and submodule update to use the actual parallelism both via > command line as well as a configured option. I decided to go with > "submodule.jobs" for all three for now. > > What is new in v2? > --- > * The patches got reordered slightly > * Documentation was adapted A couple of things I noticed (other than "many issues pointed out in v1 have been updated") are: - The way 7/8 and 8/8 checks for uninitialized max_jobs are inconsistently written. The way 7/8 does, i.e. (max_jobs < 0), looks more conventional. - "Defaults to the `submodule.jobs` option" should say "configuration variable" instead. I haven't formed an opinion on 6/8 yet. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
Stefan Bellerwrites: > On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones > wrote: > >> Hmm, is there a way to _not_ fetch in parallel (override the >> config) from the command line for a given command? >> >> ATB, >> Ramsay Jones > > git config submodule.jobs 42 > git --jobs 1 # should run just one task, despite having 42 configured > > It does use the parallel processing machinery though, but with a maximum of > one subcommand being spawned. Is that what you're asking? With this patch, do we still keep a separate machinery that bypasses the parallel thing altogether in the first place? I was hoping that the underlying parallel machinery is polished enough that using it with max=1 parallelism would be equivalent to serial execution. At least, that was my understanding of our goal, and back when we reviewed the previous "fetch --recurse-sub" series, my impression was we were already there. And in that ideal endgame world, your "Give '-j1' from the command line" would be perfectly an acceptable answer ;-). Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
On 29/10/15 15:51, Stefan Beller wrote: > On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones >wrote: > >> Hmm, is there a way to _not_ fetch in parallel (override the >> config) from the command line for a given command? >> >> ATB, >> Ramsay Jones > > git config submodule.jobs 42 > git --jobs 1 # should run just one task, despite having 42 configured Heh, yes ... I didn't pose the question quite right ... > > It does use the parallel processing machinery though, but with a maximum of > one subcommand being spawned. Is that what you're asking? ... but, despite that, you correctly inferred what I was really asking about! :) I was just wondering what overhead the parallel processing machinery adds to the original 'non-parallel' code path (for the j=1 case). I suspect the answer is 'not much', but that's just a guess. Have you measured it? What happens if there is only a single submodule to fetch? ATB, Ramsay Jones -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 0/8] Expose the submodule parallelism to the user
This replaces origin/sb/submodule-parallel-update (anchoring at 74367d8938, Merge branch 'sb/submodule-parallel-fetch' into sb/submodule-parallel-update) What does it do? --- This series should finish the on going efforts of parallelizing submodule network traffic. The patches contain tests for clone, fetch and submodule update to use the actual parallelism both via command line as well as a configured option. I decided to go with "submodule.jobs" for all three for now. What is new in v2? --- * The patches got reordered slightly * Documentation was adapted Interdiff below Stefan Beller (8): run_processes_parallel: Add output to tracing messages submodule config: keep update strategy around submodule config: remove name_and_item_from_var submodule-config: parse_config fetching submodules: Respect `submodule.jobs` config option git submodule update: have a dedicated helper for cloning submodule update: expose parallelism to the user clone: allow an explicit argument for parallel submodule clones Documentation/config.txt| 7 ++ Documentation/git-clone.txt | 6 +- Documentation/git-submodule.txt | 7 +- builtin/clone.c | 23 +++- builtin/fetch.c | 2 +- builtin/submodule--helper.c | 244 git-submodule.sh| 54 - run-command.c | 4 + submodule-config.c | 98 ++-- submodule-config.h | 3 + submodule.c | 5 + t/t5526-fetch-submodules.sh | 14 +++ t/t7400-submodule-basic.sh | 4 +- t/t7406-submodule-update.sh | 27 + 14 files changed, 418 insertions(+), 80 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 0de0138..785721a 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2643,12 +2643,12 @@ submodule..ignore:: "--ignore-submodules" option. The 'git submodule' commands are not affected by this setting. -submodule::jobs +submodule.jobs:: This is used to determine how many submodules can be operated on in parallel. Specifying a positive integer allows up to that number - of submodules being fetched in parallel. Specifying 0 the number - of cpus will be taken as the maximum number. Currently this is - used in fetch and clone operations only. + of submodules being fetched in parallel. This is used in fetch + and clone operations only. A value of 0 will give some reasonable + default. The defaults may change with different versions of Git. tag.sort:: This variable controls the sort ordering of tags when displayed by diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt index affa52e..01bd6b7 100644 --- a/Documentation/git-clone.txt +++ b/Documentation/git-clone.txt @@ -216,9 +216,10 @@ objects from the source repository into a pack in the cloned repository. The result is Git repository can be separated from working tree. --j:: ---jobs:: +-j :: +--jobs :: The number of submodules fetched at the same time. + Defaults to the `submodule.jobs` option. :: The (possibly remote) repository to clone from. See the diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt index f5429fa..c70fafd 100644 --- a/Documentation/git-submodule.txt +++ b/Documentation/git-submodule.txt @@ -374,10 +374,11 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully. clone with a history truncated to the specified number of revisions. See linkgit:git-clone[1] --j:: ---jobs:: +-j :: +--jobs :: This option is only valid for the update command. Clone new submodules in parallel with as many jobs. + Defaults to the `submodule.jobs` option. ...:: Paths to submodule(s). When specified this will restrict the command diff --git a/builtin/clone.c b/builtin/clone.c index 5ac2d89..22b9924 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -727,10 +727,7 @@ static int checkout(void) struct argv_array args = ARGV_ARRAY_INIT; argv_array_pushl(, "submodule", "update", "--init", "--recursive", NULL); - if (max_jobs == -1) - if (git_config_get_int("submodule.jobs", _jobs)) - max_jobs = 1; - if (max_jobs != 1) { + if (max_jobs != -1) { struct strbuf sb = STRBUF_INIT; strbuf_addf(, "--jobs=%d", max_jobs); argv_array_push(, sb.buf); diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index c3d438a..67dba1c 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -476,9 +476,10 @@ static int update_clone(int argc, const char **argv, const char *prefix) /* Overlay the parsed .gitmodules