Re: [PATCH 5/9] submodule update: expose parallelism to the user

2015-10-28 Thread Stefan Beller
On Tue, Oct 27, 2015 at 1:59 PM, Junio C Hamano  wrote:
> And when 0 starts to meaning something special, we would need to
> describe that here (and/or submodule.jobs entry in config.txt).
> As I already said, I do not think "0 means num_cpus" is a useful
> default, and I would prefer if we reserved 0 to mean something more
> useful we would figure out later.

Ok I'll add that, too.

I am just debating with myself where the best place is.
In run-command.c in pp_init we have:

if (n < 1)
n = online_cpus();
pp->max_processes = n;

we would need to change only that one place to insert an

die("We haven't found the right default yet for 0");

However I think for most loads online_cpus makes sense as that
is ususally the bottleneck for local operations (if being excessive
memory may become an issue, but unlikely IMHO).
So instead I think it makes more sense to add it in the fetch/clone/update
to come up with a treatment for 0.

Maybe we want to make the explicit decision for the default value
for any user of the parallel processing, such that this code above
is misguided as it leads to bad defaults if reviewers are inattentive.

So having spelled out that, we may just want to bark in the pp_init
for having a number n < 1.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/9] submodule update: expose parallelism to the user

2015-10-28 Thread Junio C Hamano
Stefan Beller  writes:

> On Tue, Oct 27, 2015 at 1:59 PM, Junio C Hamano  wrote:
>> And when 0 starts to meaning something special, we would need to
>> describe that here (and/or submodule.jobs entry in config.txt).
>> As I already said, I do not think "0 means num_cpus" is a useful
>> default, and I would prefer if we reserved 0 to mean something more
>> useful we would figure out later.
>
> Ok I'll add that, too.

Sorry, but I take it back.  We just can document that (1) "-j 0"
will give you some default, (2) we do not promise that the default
will be optimal for you from day one, (3) we reserve the right to
"improve" it over time, and (4) we promise that we won't make it an
insanely wrong value.  And let's keep "0 currently means num_cpu",
which may or may not be optimal but it cannot be an "insanely wrong"
value.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/9] submodule update: expose parallelism to the user

2015-10-27 Thread Stefan Beller
Expose possible parallelism either via the "--jobs" CLI parameter or
the "submodule.jobs" setting.

By having the variable initialized to -1, we make sure 0 can be passed
into the parallel processing machine, which will then pick as many parallel
workers as there are CPUs.

Signed-off-by: Stefan Beller 
---
 Documentation/git-submodule.txt |  6 +-
 builtin/submodule--helper.c | 17 +
 git-submodule.sh|  9 +
 t/t7406-submodule-update.sh | 12 
 4 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index f17687e..f5429fa 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -16,7 +16,7 @@ SYNOPSIS
 'git submodule' [--quiet] deinit [-f|--force] [--] ...
 'git submodule' [--quiet] update [--init] [--remote] [-N|--no-fetch]
  [-f|--force] [--rebase|--merge] [--reference ]
- [--depth ] [--recursive] [--] [...]
+ [--depth ] [--recursive] [--jobs ] [--] [...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) ]
  [commit] [--] [...]
 'git submodule' [--quiet] foreach [--recursive] 
@@ -374,6 +374,10 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` 
options carefully.
clone with a history truncated to the specified number of revisions.
See linkgit:git-clone[1]
 
+-j::
+--jobs::
+   This option is only valid for the update command.
+   Clone new submodules in parallel with as many jobs.
 
 ...::
Paths to submodule(s). When specified this will restrict the command
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 1ec1b85..c3d438a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -431,6 +431,7 @@ static int update_clone_task_finished(int result,
 
 static int update_clone(int argc, const char **argv, const char *prefix)
 {
+   int max_jobs = -1;
struct string_list_item *item;
struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
 
@@ -451,6 +452,8 @@ static int update_clone(int argc, const char **argv, const 
char *prefix)
OPT_STRING(0, "depth", , "",
   N_("Create a shallow clone truncated to the "
  "specified number of revisions")),
+   OPT_INTEGER('j', "jobs", _jobs,
+   N_("parallel jobs")),
OPT__QUIET(, N_("do't print cloning progress")),
OPT_END()
};
@@ -472,10 +475,16 @@ static int update_clone(int argc, const char **argv, 
const char *prefix)
gitmodules_config();
/* Overlay the parsed .gitmodules file with .git/config */
git_config(git_submodule_config, NULL);
-   run_processes_parallel(1, update_clone_get_next_task,
- update_clone_start_failure,
- update_clone_task_finished,
- );
+
+   if (max_jobs == -1)
+   if (git_config_get_int("submodule.jobs", _jobs))
+   max_jobs = 1;
+
+   run_processes_parallel(max_jobs,
+  update_clone_get_next_task,
+  update_clone_start_failure,
+  update_clone_task_finished,
+  );
 
if (pp.print_unmatched) {
printf("#unmatched\n");
diff --git a/git-submodule.sh b/git-submodule.sh
index ea883b9..c2dfb16 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -636,6 +636,14 @@ cmd_update()
--depth=*)
depth=$1
;;
+   -j|--jobs)
+   case "$2" in '') usage ;; esac
+   jobs="--jobs=$2"
+   shift
+   ;;
+   --jobs=*)
+   jobs=$1
+   ;;
--)
shift
break
@@ -661,6 +669,7 @@ cmd_update()
${update:+--update "$update"} \
${reference:+--reference "$reference"} \
${depth:+--depth "$depth"} \
+   ${jobs:+$jobs} \
"$@" | {
err=
while read mode sha1 stage just_cloned sm_path
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index dda3929..05ea66f 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -774,4 +774,16 @@ test_expect_success 'submodule update --recursive drops 
module name before recur
 test_i18ngrep "Submodule path .deeper/submodule/subsubmodule.: checked 
out" actual
)
 '
+
+test_expect_success 'submodule update can be run in parallel' '
+   (cd super2 &&
+GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 7 &&
+grep "7 

Re: [PATCH 5/9] submodule update: expose parallelism to the user

2015-10-27 Thread Junio C Hamano
Stefan Beller  writes:

> @@ -374,6 +374,10 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` 
> options carefully.
>   clone with a history truncated to the specified number of revisions.
>   See linkgit:git-clone[1]
>  
> +-j::
> +--jobs::

This probably should be 

  -j ::
  --jobs ::

(see comments on [6/9]).  I know the option description in this file
is sloppy and does not say "--name " etc., as it should (but
it does say "--reference "), and fixing them may not be
within the scope of this series, but we do not need to add more to
the existing problems.

> + This option is only valid for the update command.
> + Clone new submodules in parallel with as many jobs.

And when 0 starts to meaning something special, we would need to
describe that here (and/or submodule.jobs entry in config.txt).
As I already said, I do not think "0 means num_cpus" is a useful
default, and I would prefer if we reserved 0 to mean something more
useful we would figure out later.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html