Re: [PATCH] Enable parallelism in git submodule update.

2012-11-03 Thread Jens Lehmann
Am 03.11.2012 19:44, schrieb Phil Hord:
> On Sat, Nov 3, 2012 at 11:42 AM, Jens Lehmann  wrote:
>> Am 30.10.2012 19:11, schrieb Stefan Zager:
>>> This is a refresh of a conversation from a couple of months ago.
>>>
>>> I didn't try to implement all the desired features (e.g., smart logic
>>> for passing a -j parameter to recursive submodule invocations), but I
>>> did address the one issue that Junio insisted on: the code makes a
>>> best effort to detect whether xargs supports parallel execution on the
>>> host platform, and if it doesn't, then it prints a warning and falls
>>> back to serial execution.
>>
>> I suspect not passing on --jobs recursively like you do here is the
>> right thing to do, as that would give exponential growth of jobs with
>> recursion depth, which makes no sense to me.
> 
> On the other hand, since $jobs is still defined when the recursive
> call to is made to 'eval cmd_update "$orig_flags"', I suspect the
> value *is* passed down recursively.

But for $jobs != 1 Stefan's code doesn't use eval cmd_update but
starts the submodule script again:

+   xargs $max_lines -P "$jobs" git submodule update 
$orig_flags

That should get rid of the $jobs setting, or am I missing something?

>  Maybe $jobs should be manually
> reset before recursing -- unless it is "0" -- though I expect someone
> would feel differently if she had one submodule on level 1 and 10
> submodules on level 2.  She would be surprised, then, when  --jobs=10
> seemed to have little affect on performance.

Hmm, good point. However we implement that, it should at least be
properly documented in the man page (and in the use case you describe
a "git submodule foreach 'git submodule update -j 10'" could be the
solution if we choose to not propagate the jobs option).

>  So maybe it is best to
> leave it as it is, excepting that the apparent attempt not to pass the
> switch down is probably misleading.

I didn't test it, but I think it should work (famous last words ;-).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable parallelism in git submodule update.

2012-11-03 Thread Jens Lehmann
Am 29.07.2012 17:37, schrieb Jens Lehmann:
> Am 27.07.2012 20:37, schrieb Stefan Zager:
>> The --jobs parameter may be used to set the degree of per-submodule
>> parallel execution.
> 
> I think this is a sound idea, but it would be good to see some
> actual measurements. What are the performance numbers with and
> without this change? Which cases do benefit and are there some
> which run slower when run in parallel?

ping?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable parallelism in git submodule update.

2012-11-03 Thread Phil Hord
On Sat, Nov 3, 2012 at 11:42 AM, Jens Lehmann  wrote:
> Am 30.10.2012 19:11, schrieb Stefan Zager:
>> This is a refresh of a conversation from a couple of months ago.
>>
>> I didn't try to implement all the desired features (e.g., smart logic
>> for passing a -j parameter to recursive submodule invocations), but I
>> did address the one issue that Junio insisted on: the code makes a
>> best effort to detect whether xargs supports parallel execution on the
>> host platform, and if it doesn't, then it prints a warning and falls
>> back to serial execution.
>
> I suspect not passing on --jobs recursively like you do here is the
> right thing to do, as that would give exponential growth of jobs with
> recursion depth, which makes no sense to me.

On the other hand, since $jobs is still defined when the recursive
call to is made to 'eval cmd_update "$orig_flags"', I suspect the
value *is* passed down recursively.  Maybe $jobs should be manually
reset before recursing -- unless it is "0" -- though I expect someone
would feel differently if she had one submodule on level 1 and 10
submodules on level 2.  She would be surprised, then, when  --jobs=10
seemed to have little affect on performance.  So maybe it is best to
leave it as it is, excepting that the apparent attempt not to pass the
switch down is probably misleading.

Phil
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable parallelism in git submodule update.

2012-11-03 Thread Jens Lehmann
Am 30.10.2012 19:11, schrieb Stefan Zager:
> This is a refresh of a conversation from a couple of months ago.
> 
> I didn't try to implement all the desired features (e.g., smart logic
> for passing a -j parameter to recursive submodule invocations), but I
> did address the one issue that Junio insisted on: the code makes a
> best effort to detect whether xargs supports parallel execution on the
> host platform, and if it doesn't, then it prints a warning and falls
> back to serial execution.

I suspect not passing on --jobs recursively like you do here is the
right thing to do, as that would give exponential growth of jobs with
recursion depth, which makes no sense to me.

A still unsolved issue is the unstructured output from the different
update jobs. It'll be hard (if not impossible) to see in what submodule
which update took place (or failed). I think we should have a solution
for that too (maybe one of those Heiko mentioned or something as simple
as implying "-q"?).

> Stefan
> 
> On Tue, Oct 30, 2012 at 11:03 AM,   wrote:
>> The --jobs parameter may be used to set the degree of per-submodule
>> parallel execution.
>
>> Signed-off-by: Stefan Zager 
>> ---
>>  Documentation/git-submodule.txt |8 ++-
>>  git-submodule.sh|   40 
>> ++-
>>  2 files changed, 46 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/git-submodule.txt 
>> b/Documentation/git-submodule.txt
>> index b4683bb..cb23ba7 100644
>> --- a/Documentation/git-submodule.txt
>> +++ b/Documentation/git-submodule.txt
>> @@ -14,7 +14,8 @@ SYNOPSIS
>>  'git submodule' [--quiet] status [--cached] [--recursive] [--] [...]
>>  'git submodule' [--quiet] init [--] [...]
>>  'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
>> - [--reference ] [--merge] [--recursive] [--] 
>> [...]
>> + [--reference ] [--merge] [--recursive]
>> + [-j|--jobs [jobs]] [--] [...]
>>  'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) 
>> ]
>>   [commit] [--] [...]
>>  'git submodule' [--quiet] foreach [--recursive] 
>> @@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just 
>> want to use the
>>  setting as stored in .gitmodules, you can automatically initialize the
>>  submodule with the `--init` option.
>>  +
>> +By default, each submodule is treated serially.  You may specify a degree of
>> +parallel execution with the --jobs flag.  If a parameter is provided, it is
>> +the maximum number of jobs to run in parallel; without a parameter, all 
>> jobs are
>> +run in parallel.
>> ++

The new "--jobs" option should be documented under "OPTIONS", (and maybe
include that "--jobs 0" does the same as "--jobs" alone and that this is
not supported on all platforms).

>>  If `--recursive` is specified, this command will recurse into the
>>  registered submodules, and update any nested submodules within.
>>  +
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index ab6b110..60a5f96 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
>>  USAGE="[--quiet] add [-b branch] [-f|--force] [--reference ] 
>> [--]  []
>> or: $dashless [--quiet] status [--cached] [--recursive] [--] [...]
>> or: $dashless [--quiet] init [--] [...]
>> -   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
>> [--rebase] [--reference ] [--merge] [--recursive] [--] 
>> [...]
>> +   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
>> [--rebase] [--reference ] [--merge] [--recursive] [-j|--jobs 
>> [jobs]] [--] [...]
>> or: $dashless [--quiet] summary [--cached|--files] [--summary-limit ] 
>> [commit] [--] [...]
>> or: $dashless [--quiet] foreach [--recursive] 
>> or: $dashless [--quiet] sync [--] [...]"
>> @@ -500,6 +500,7 @@ cmd_update()
>>  {
>> # parse $args after "submodule ... update".
>> orig_flags=
>> +   jobs="1"
>> while test $# -ne 0
>> do
>> case "$1" in
>> @@ -518,6 +519,20 @@ cmd_update()
>> -r|--rebase)
>> update="rebase"
>> ;;
>> +   -j|--jobs)
>> +   case "$2" in
>> +   ''|-*)
>> +   jobs="0"
>> +   ;;
>> +   *)
>> +   jobs="$2"
>> +   shift
>> +   ;;
>> +   esac
>> +   # Don't preserve this arg.
>> +   shift
>> +   continue
>> +   ;;
>> --reference)
>> case "$2" in '') usage ;; esac
>> reference="--reference=$2"
>> @@ -551,11 +566,34 @@ cmd_update()
>> shift
>> done
>>
>> +  

Re: [PATCH] Enable parallelism in git submodule update.

2012-11-02 Thread Stefan Zager
ping?

On Tue, Oct 30, 2012 at 11:11 AM, Stefan Zager  wrote:
> This is a refresh of a conversation from a couple of months ago.
>
> I didn't try to implement all the desired features (e.g., smart logic
> for passing a -j parameter to recursive submodule invocations), but I
> did address the one issue that Junio insisted on: the code makes a
> best effort to detect whether xargs supports parallel execution on the
> host platform, and if it doesn't, then it prints a warning and falls
> back to serial execution.
>
> Stefan
>
> On Tue, Oct 30, 2012 at 11:03 AM,   wrote:
>> The --jobs parameter may be used to set the degree of per-submodule
>> parallel execution.
>>
>> Signed-off-by: Stefan Zager 
>> ---
>>  Documentation/git-submodule.txt |8 ++-
>>  git-submodule.sh|   40 
>> ++-
>>  2 files changed, 46 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/git-submodule.txt 
>> b/Documentation/git-submodule.txt
>> index b4683bb..cb23ba7 100644
>> --- a/Documentation/git-submodule.txt
>> +++ b/Documentation/git-submodule.txt
>> @@ -14,7 +14,8 @@ SYNOPSIS
>>  'git submodule' [--quiet] status [--cached] [--recursive] [--] [...]
>>  'git submodule' [--quiet] init [--] [...]
>>  'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
>> - [--reference ] [--merge] [--recursive] [--] 
>> [...]
>> + [--reference ] [--merge] [--recursive]
>> + [-j|--jobs [jobs]] [--] [...]
>>  'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) 
>> ]
>>   [commit] [--] [...]
>>  'git submodule' [--quiet] foreach [--recursive] 
>> @@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just 
>> want to use the
>>  setting as stored in .gitmodules, you can automatically initialize the
>>  submodule with the `--init` option.
>>  +
>> +By default, each submodule is treated serially.  You may specify a degree of
>> +parallel execution with the --jobs flag.  If a parameter is provided, it is
>> +the maximum number of jobs to run in parallel; without a parameter, all 
>> jobs are
>> +run in parallel.
>> ++
>>  If `--recursive` is specified, this command will recurse into the
>>  registered submodules, and update any nested submodules within.
>>  +
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index ab6b110..60a5f96 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
>>  USAGE="[--quiet] add [-b branch] [-f|--force] [--reference ] 
>> [--]  []
>> or: $dashless [--quiet] status [--cached] [--recursive] [--] [...]
>> or: $dashless [--quiet] init [--] [...]
>> -   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
>> [--rebase] [--reference ] [--merge] [--recursive] [--] 
>> [...]
>> +   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
>> [--rebase] [--reference ] [--merge] [--recursive] [-j|--jobs 
>> [jobs]] [--] [...]
>> or: $dashless [--quiet] summary [--cached|--files] [--summary-limit ] 
>> [commit] [--] [...]
>> or: $dashless [--quiet] foreach [--recursive] 
>> or: $dashless [--quiet] sync [--] [...]"
>> @@ -500,6 +500,7 @@ cmd_update()
>>  {
>> # parse $args after "submodule ... update".
>> orig_flags=
>> +   jobs="1"
>> while test $# -ne 0
>> do
>> case "$1" in
>> @@ -518,6 +519,20 @@ cmd_update()
>> -r|--rebase)
>> update="rebase"
>> ;;
>> +   -j|--jobs)
>> +   case "$2" in
>> +   ''|-*)
>> +   jobs="0"
>> +   ;;
>> +   *)
>> +   jobs="$2"
>> +   shift
>> +   ;;
>> +   esac
>> +   # Don't preserve this arg.
>> +   shift
>> +   continue
>> +   ;;
>> --reference)
>> case "$2" in '') usage ;; esac
>> reference="--reference=$2"
>> @@ -551,11 +566,34 @@ cmd_update()
>> shift
>> done
>>
>> +   # Correctly handle the case where '-q' came before 'update' on the 
>> command line.
>> +   if test -n "$GIT_QUIET"
>> +   then
>> +   orig_flags="$orig_flags -q"
>> +   fi
>> +
>> if test -n "$init"
>> then
>> cmd_init "--" "$@" || return
>> fi
>>
>> +   if test "$jobs" != 1
>> +   then
>> +   if ( echo test | xargs -P "$jobs" true 2>/dev/null )
>> +   then
>> +   if ( echo test | xargs --max-lines=1 true 
>> 2>/dev/null ); then
>> +   max_lines="--max-lines=1"
>> +   else
>> +   

Re: [PATCH] Enable parallelism in git submodule update.

2012-10-30 Thread Stefan Zager
This is a refresh of a conversation from a couple of months ago.

I didn't try to implement all the desired features (e.g., smart logic
for passing a -j parameter to recursive submodule invocations), but I
did address the one issue that Junio insisted on: the code makes a
best effort to detect whether xargs supports parallel execution on the
host platform, and if it doesn't, then it prints a warning and falls
back to serial execution.

Stefan

On Tue, Oct 30, 2012 at 11:03 AM,   wrote:
> The --jobs parameter may be used to set the degree of per-submodule
> parallel execution.
>
> Signed-off-by: Stefan Zager 
> ---
>  Documentation/git-submodule.txt |8 ++-
>  git-submodule.sh|   40 
> ++-
>  2 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
> index b4683bb..cb23ba7 100644
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -14,7 +14,8 @@ SYNOPSIS
>  'git submodule' [--quiet] status [--cached] [--recursive] [--] [...]
>  'git submodule' [--quiet] init [--] [...]
>  'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
> - [--reference ] [--merge] [--recursive] [--] 
> [...]
> + [--reference ] [--merge] [--recursive]
> + [-j|--jobs [jobs]] [--] [...]
>  'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) 
> ]
>   [commit] [--] [...]
>  'git submodule' [--quiet] foreach [--recursive] 
> @@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just 
> want to use the
>  setting as stored in .gitmodules, you can automatically initialize the
>  submodule with the `--init` option.
>  +
> +By default, each submodule is treated serially.  You may specify a degree of
> +parallel execution with the --jobs flag.  If a parameter is provided, it is
> +the maximum number of jobs to run in parallel; without a parameter, all jobs 
> are
> +run in parallel.
> ++
>  If `--recursive` is specified, this command will recurse into the
>  registered submodules, and update any nested submodules within.
>  +
> diff --git a/git-submodule.sh b/git-submodule.sh
> index ab6b110..60a5f96 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
>  USAGE="[--quiet] add [-b branch] [-f|--force] [--reference ] 
> [--]  []
> or: $dashless [--quiet] status [--cached] [--recursive] [--] [...]
> or: $dashless [--quiet] init [--] [...]
> -   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
> [--rebase] [--reference ] [--merge] [--recursive] [--] [...]
> +   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
> [--rebase] [--reference ] [--merge] [--recursive] [-j|--jobs 
> [jobs]] [--] [...]
> or: $dashless [--quiet] summary [--cached|--files] [--summary-limit ] 
> [commit] [--] [...]
> or: $dashless [--quiet] foreach [--recursive] 
> or: $dashless [--quiet] sync [--] [...]"
> @@ -500,6 +500,7 @@ cmd_update()
>  {
> # parse $args after "submodule ... update".
> orig_flags=
> +   jobs="1"
> while test $# -ne 0
> do
> case "$1" in
> @@ -518,6 +519,20 @@ cmd_update()
> -r|--rebase)
> update="rebase"
> ;;
> +   -j|--jobs)
> +   case "$2" in
> +   ''|-*)
> +   jobs="0"
> +   ;;
> +   *)
> +   jobs="$2"
> +   shift
> +   ;;
> +   esac
> +   # Don't preserve this arg.
> +   shift
> +   continue
> +   ;;
> --reference)
> case "$2" in '') usage ;; esac
> reference="--reference=$2"
> @@ -551,11 +566,34 @@ cmd_update()
> shift
> done
>
> +   # Correctly handle the case where '-q' came before 'update' on the 
> command line.
> +   if test -n "$GIT_QUIET"
> +   then
> +   orig_flags="$orig_flags -q"
> +   fi
> +
> if test -n "$init"
> then
> cmd_init "--" "$@" || return
> fi
>
> +   if test "$jobs" != 1
> +   then
> +   if ( echo test | xargs -P "$jobs" true 2>/dev/null )
> +   then
> +   if ( echo test | xargs --max-lines=1 true 2>/dev/null 
> ); then
> +   max_lines="--max-lines=1"
> +   else
> +   max_lines="-L 1"
> +   fi
> +   module_list "$@" | awk '{print $4}' |
> +   xargs $max_lines -P "$jobs" git submodu

[PATCH] Enable parallelism in git submodule update.

2012-10-30 Thread szager
The --jobs parameter may be used to set the degree of per-submodule
parallel execution.

Signed-off-by: Stefan Zager 
---
 Documentation/git-submodule.txt |8 ++-
 git-submodule.sh|   40 ++-
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index b4683bb..cb23ba7 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -14,7 +14,8 @@ SYNOPSIS
 'git submodule' [--quiet] status [--cached] [--recursive] [--] [...]
 'git submodule' [--quiet] init [--] [...]
 'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
- [--reference ] [--merge] [--recursive] [--] 
[...]
+ [--reference ] [--merge] [--recursive]
+ [-j|--jobs [jobs]] [--] [...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) ]
  [commit] [--] [...]
 'git submodule' [--quiet] foreach [--recursive] 
@@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just want 
to use the
 setting as stored in .gitmodules, you can automatically initialize the
 submodule with the `--init` option.
 +
+By default, each submodule is treated serially.  You may specify a degree of
+parallel execution with the --jobs flag.  If a parameter is provided, it is
+the maximum number of jobs to run in parallel; without a parameter, all jobs 
are
+run in parallel.
++
 If `--recursive` is specified, this command will recurse into the
 registered submodules, and update any nested submodules within.
 +
diff --git a/git-submodule.sh b/git-submodule.sh
index ab6b110..60a5f96 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
 USAGE="[--quiet] add [-b branch] [-f|--force] [--reference ] [--] 
 []
or: $dashless [--quiet] status [--cached] [--recursive] [--] [...]
or: $dashless [--quiet] init [--] [...]
-   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
[--rebase] [--reference ] [--merge] [--recursive] [--] [...]
+   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
[--rebase] [--reference ] [--merge] [--recursive] [-j|--jobs 
[jobs]] [--] [...]
or: $dashless [--quiet] summary [--cached|--files] [--summary-limit ] 
[commit] [--] [...]
or: $dashless [--quiet] foreach [--recursive] 
or: $dashless [--quiet] sync [--] [...]"
@@ -500,6 +500,7 @@ cmd_update()
 {
# parse $args after "submodule ... update".
orig_flags=
+   jobs="1"
while test $# -ne 0
do
case "$1" in
@@ -518,6 +519,20 @@ cmd_update()
-r|--rebase)
update="rebase"
;;
+   -j|--jobs)
+   case "$2" in
+   ''|-*)
+   jobs="0"
+   ;;
+   *)
+   jobs="$2"
+   shift
+   ;;
+   esac
+   # Don't preserve this arg.
+   shift
+   continue
+   ;;
--reference)
case "$2" in '') usage ;; esac
reference="--reference=$2"
@@ -551,11 +566,34 @@ cmd_update()
shift
done
 
+   # Correctly handle the case where '-q' came before 'update' on the 
command line.
+   if test -n "$GIT_QUIET"
+   then
+   orig_flags="$orig_flags -q"
+   fi
+
if test -n "$init"
then
cmd_init "--" "$@" || return
fi
 
+   if test "$jobs" != 1
+   then
+   if ( echo test | xargs -P "$jobs" true 2>/dev/null )
+   then
+   if ( echo test | xargs --max-lines=1 true 2>/dev/null 
); then
+   max_lines="--max-lines=1"
+   else
+   max_lines="-L 1"
+   fi
+   module_list "$@" | awk '{print $4}' |
+   xargs $max_lines -P "$jobs" git submodule update 
$orig_flags
+   return
+   else
+   echo "Warn: parallel execution is not supported on this 
platform."
+   fi
+   fi
+
cloned_modules=
module_list "$@" | {
err=
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Enable parallelism in git submodule update.

2012-10-30 Thread szager
The --jobs parameter may be used to set the degree of per-submodule
parallel execution.

Signed-off-by: Stefan Zager 
---
 Documentation/git-submodule.txt |8 ++-
 git-submodule.sh|   40 ++-
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index b4683bb..cb23ba7 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -14,7 +14,8 @@ SYNOPSIS
 'git submodule' [--quiet] status [--cached] [--recursive] [--] [...]
 'git submodule' [--quiet] init [--] [...]
 'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
- [--reference ] [--merge] [--recursive] [--] 
[...]
+ [--reference ] [--merge] [--recursive]
+ [-j|--jobs [jobs]] [--] [...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) ]
  [commit] [--] [...]
 'git submodule' [--quiet] foreach [--recursive] 
@@ -146,6 +147,11 @@ If the submodule is not yet initialized, and you just want 
to use the
 setting as stored in .gitmodules, you can automatically initialize the
 submodule with the `--init` option.
 +
+By default, each submodule is treated serially.  You may specify a degree of
+parallel execution with the --jobs flag.  If a parameter is provided, it is
+the maximum number of jobs to run in parallel; without a parameter, all jobs 
are
+run in parallel.
++
 If `--recursive` is specified, this command will recurse into the
 registered submodules, and update any nested submodules within.
 +
diff --git a/git-submodule.sh b/git-submodule.sh
index ab6b110..60a5f96 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
 USAGE="[--quiet] add [-b branch] [-f|--force] [--reference ] [--] 
 []
or: $dashless [--quiet] status [--cached] [--recursive] [--] [...]
or: $dashless [--quiet] init [--] [...]
-   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
[--rebase] [--reference ] [--merge] [--recursive] [--] [...]
+   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
[--rebase] [--reference ] [--merge] [--recursive] [-j|--jobs 
[jobs]] [--] [...]
or: $dashless [--quiet] summary [--cached|--files] [--summary-limit ] 
[commit] [--] [...]
or: $dashless [--quiet] foreach [--recursive] 
or: $dashless [--quiet] sync [--] [...]"
@@ -500,6 +500,7 @@ cmd_update()
 {
# parse $args after "submodule ... update".
orig_flags=
+   jobs="1"
while test $# -ne 0
do
case "$1" in
@@ -518,6 +519,20 @@ cmd_update()
-r|--rebase)
update="rebase"
;;
+   -j|--jobs)
+   case "$2" in
+   ''|-*)
+   jobs="0"
+   ;;
+   *)
+   jobs="$2"
+   shift
+   ;;
+   esac
+   # Don't preserve this arg.
+   shift
+   continue
+   ;;
--reference)
case "$2" in '') usage ;; esac
reference="--reference=$2"
@@ -551,11 +566,34 @@ cmd_update()
shift
done
 
+   # Correctly handle the case where '-q' came before 'update' on the 
command line.
+   if test -n "$GIT_QUIET"
+   then
+   orig_flags="$orig_flags -q"
+   fi
+
if test -n "$init"
then
cmd_init "--" "$@" || return
fi
 
+   if test "$jobs" != 1
+   then
+   if ( echo test | xargs -P "$jobs" true 2>/dev/null )
+   then
+   if ( echo test | xargs --max-lines=1 true 2>/dev/null 
); then
+   max_lines="--max-lines=1"
+   else
+   max_lines="-L 1"
+   fi
+   module_list "$@" | awk '{print $4}' |
+   xargs $max_lines -P "$jobs" git submodule update 
$orig_flags
+   return
+   else
+   echo "Warn: parallel execution is not supported on this 
platform."
+   fi
+   fi
+
cloned_modules=
module_list "$@" | {
err=
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable parallelism in git submodule update.

2012-07-29 Thread Junio C Hamano
Heiko Voigt  writes:

> On Fri, Jul 27, 2012 at 04:25:58PM -0700, Junio C Hamano wrote:
> ...
>> Of course, any set of rules have exceptions ;-) There are a few
>> things to which we say "Even though it is not in POSIX, everybody
>> who matters supports it, and without taking advantage of it, what we
>> want to achieve will become too cumbersome to express".
>
> I was about to write that since this is limited to a given --jobs
> options the majority platforms should be enough as a start and others
> could add a parallelism mechanism later. Its only a matter of efficiency
> and not features.

As long as "git submodule --jobs 9" on a platform without GNU
enhanced xargs does not error out and gracefully degrade to non
parallel execution, I do not have any problem with it.  As posted,
the patch has not yet achieved that doneness yet.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable parallelism in git submodule update.

2012-07-29 Thread Jens Lehmann
Am 27.07.2012 20:37, schrieb Stefan Zager:
> The --jobs parameter may be used to set the degree of per-submodule
> parallel execution.

I think this is a sound idea, but it would be good to see some
actual measurements. What are the performance numbers with and
without this change? Which cases do benefit and are there some
which run slower when run in parallel?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable parallelism in git submodule update.

2012-07-28 Thread Heiko Voigt
Hi,

On Fri, Jul 27, 2012 at 04:25:58PM -0700, Junio C Hamano wrote:
> Stefan Zager  writes:
> 
> > On Fri, Jul 27, 2012 at 2:38 PM, Junio C Hamano  wrote:
> >
> >> Stefan Zager  writes:
> >>
> >> > + module_list "$@" | awk '{print $4}' | xargs -L 1 -P
> >> "$jobs" git submodule update $orig_args
> >>
> >> Capital-P option to xargs is not even in POSIX, no?
> >
> > I wasn't aware of that, but you appear to be correct.  Don't know if you
> > have a policy about that, but anecdotally, -P is supported on my linux,
> > mac, and win/msys systems.
> 
> About "policy", we use POSIX as a rough yardstick to warn us that we
> might be breaking people on minority platforms.  We do _not_ say "It
> is in POSIX, so it is safe to use it", but we say "It is not even in
> POSIX, so we need to think twice."  We do not usually say "Linux,
> Mac and Windows are the only things that matter, and they all
> support it."
> 
> Of course, any set of rules have exceptions ;-) There are a few
> things to which we say "Even though it is not in POSIX, everybody
> who matters supports it, and without taking advantage of it, what we
> want to achieve will become too cumbersome to express".

I was about to write that since this is limited to a given --jobs
options the majority platforms should be enough as a start and others
could add a parallelism mechanism later. Its only a matter of efficiency
and not features.

But if you look at my other post to this thread I described that we need
some UI output extension so the user can still make sense of it.
In short: The user should be able distinguish which job said what.

I was already thinking about how an output caching could be implemented in
core git. How about exposing it as a git command like this?

git run [-j] ...

It works like the xargs call above except that it caches each jobs
output to stderr and stdout until its done and then replays the output
to stderr/out in the correct order.

We could design the code so that it can be reused later on to do the
caching in parallel fetch/push/... .

What do you think? If we decide to go this route I would have a look
into whipping something up.

Cheers Heiko
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable parallelism in git submodule update.

2012-07-28 Thread Heiko Voigt
Hi Stefan,

neat patch. See below for a few notes.

On Fri, Jul 27, 2012 at 11:37:34AM -0700, Stefan Zager wrote:
> diff --git a/git-submodule.sh b/git-submodule.sh
> index dba4d39..761420a 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -491,6 +492,20 @@ cmd_update()
>   -r|--rebase)
>   update="rebase"
>   ;;
> + -j|--jobs)
> + case "$2" in
> + ''|-*)
> + jobs="0"
> + ;;
> + *)
> + jobs="$2"
> + shift
> + ;;
> + esac
> + # Don't preserve this arg.
> + shift
> + continue
> + ;;
>   --reference)
>   case "$2" in '') usage ;; esac
>   reference="--reference=$2"
> @@ -529,6 +544,12 @@ cmd_update()
>   cmd_init "--" "$@" || return
>   fi
>  
> + if test "$jobs" != "1"
> + then
> + module_list "$@" | awk '{print $4}' | xargs -L 1 -P "$jobs" git 
> submodule update $orig_args

I do not see orig_args set anywhere in submodule.sh. It seems the
existing usage of it in cmd_status() is a leftover from commit
98dbe63 when this variable got renamed to orig_flags.

I will follow up with a patch to that location.

Another problem here is the passing of arguments. Have a look at
a7eff1a8 to see how this was solved for other locations.

The next thing I noticed is that the parallelism is not recursive. You
drop the option and only execute the first depth in parallel. How about
using the amount of modules defined by arguments left in $@ as an
indicator whether you need to fork parallel execution or not. If there
is exactly one you do the update if there are more you do the parallel
thing. That way you can just keep passing the --jobs flag to the
subprocesses.

The next question to solve is UI: Since the output lines of the parallel
update jobs will be mixed we need some way to distinguish them. Imagine
one of the update fails somewhere how do we find out which it was?

Two possible solutions come to my mind:

 1. Prefix each line with a job number. This way you can distinguish
which process outputted what and still have immediate feedback.

 2. Cache the output (to stderr and stdout) of each job and output it
once one job is done. I imagine this needs some infrastructure which
we need to implement. We already have some ideas how to collect such
output in C here[1].

I would prefer solution 2 since the output of 1 will be hard to read but
I guess we could start with 1 and then move over to 2 later on.

Cheers Heiko

[1] http://article.gmane.org/gmane.comp.version-control.git/197747
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable parallelism in git submodule update.

2012-07-27 Thread Junio C Hamano
Stefan Zager  writes:

> On Fri, Jul 27, 2012 at 2:38 PM, Junio C Hamano  wrote:
>
>> Stefan Zager  writes:
>>
>> > + module_list "$@" | awk '{print $4}' | xargs -L 1 -P
>> "$jobs" git submodule update $orig_args
>>
>> Capital-P option to xargs is not even in POSIX, no?
>
> I wasn't aware of that, but you appear to be correct.  Don't know if you
> have a policy about that, but anecdotally, -P is supported on my linux,
> mac, and win/msys systems.

About "policy", we use POSIX as a rough yardstick to warn us that we
might be breaking people on minority platforms.  We do _not_ say "It
is in POSIX, so it is safe to use it", but we say "It is not even in
POSIX, so we need to think twice."  We do not usually say "Linux,
Mac and Windows are the only things that matter, and they all
support it."

Of course, any set of rules have exceptions ;-) There are a few
things to which we say "Even though it is not in POSIX, everybody
who matters supports it, and without taking advantage of it, what we
want to achieve will become too cumbersome to express".

In the core parts of the system, we try to be very conservative. In
the fringe where nobody cares about, we tend to be looser.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable parallelism in git submodule update.

2012-07-27 Thread Junio C Hamano
Stefan Zager  writes:

> + module_list "$@" | awk '{print $4}' | xargs -L 1 -P "$jobs" git 
> submodule update $orig_args

Capital-P option to xargs is not even in POSIX, no?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Enable parallelism in git submodule update.

2012-07-27 Thread Stefan Zager
The --jobs parameter may be used to set the degree of per-submodule
parallel execution.

Signed-off-by: Stefan Zager 
---
 Documentation/git-submodule.txt |  8 +++-
 git-submodule.sh| 23 ++-
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index fbbbcb2..34f81fb 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -14,7 +14,8 @@ SYNOPSIS
 'git submodule' [--quiet] status [--cached] [--recursive] [--] [...]
 'git submodule' [--quiet] init [--] [...]
 'git submodule' [--quiet] update [--init] [-N|--no-fetch] [--rebase]
- [--reference ] [--merge] [--recursive] [--] 
[...]
+ [--reference ] [--merge] [--recursive]
+ [-j|--jobs [jobs]] [--] [...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) ]
  [commit] [--] [...]
 'git submodule' [--quiet] foreach [--recursive] 
@@ -147,6 +148,11 @@ If the submodule is not yet initialized, and you just want 
to use the
 setting as stored in .gitmodules, you can automatically initialize the
 submodule with the `--init` option.
 +
+By default, each submodule is treated serially.  You may specify a degree of
+parallel execution with the --jobs flag.  If a parameter is provided, it is
+the maximum number of jobs to run in parallel; without a parameter, all jobs 
are
+run in parallel.
++
 If `--recursive` is specified, this command will recurse into the
 registered submodules, and update any nested submodules within.
 
diff --git a/git-submodule.sh b/git-submodule.sh
index dba4d39..761420a 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -8,7 +8,7 @@ dashless=$(basename "$0" | sed -e 's/-/ /')
 USAGE="[--quiet] add [-b branch] [-f|--force] [--reference ] [--] 
 []
or: $dashless [--quiet] status [--cached] [--recursive] [--] [...]
or: $dashless [--quiet] init [--] [...]
-   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
[--rebase] [--reference ] [--merge] [--recursive] [--] [...]
+   or: $dashless [--quiet] update [--init] [-N|--no-fetch] [-f|--force] 
[--rebase] [--reference ] [--merge] [--recursive] [-j|--jobs 
[jobs]] [--] [...]
or: $dashless [--quiet] summary [--cached|--files] [--summary-limit ] 
[commit] [--] [...]
or: $dashless [--quiet] foreach [--recursive] 
or: $dashless [--quiet] sync [--] [...]"
@@ -473,6 +473,7 @@ cmd_update()
 {
# parse $args after "submodule ... update".
orig_flags=
+   jobs="1"
while test $# -ne 0
do
case "$1" in
@@ -491,6 +492,20 @@ cmd_update()
-r|--rebase)
update="rebase"
;;
+   -j|--jobs)
+   case "$2" in
+   ''|-*)
+   jobs="0"
+   ;;
+   *)
+   jobs="$2"
+   shift
+   ;;
+   esac
+   # Don't preserve this arg.
+   shift
+   continue
+   ;;
--reference)
case "$2" in '') usage ;; esac
reference="--reference=$2"
@@ -529,6 +544,12 @@ cmd_update()
cmd_init "--" "$@" || return
fi
 
+   if test "$jobs" != "1"
+   then
+   module_list "$@" | awk '{print $4}' | xargs -L 1 -P "$jobs" git 
submodule update $orig_args
+   return
+   fi
+
cloned_modules=
module_list "$@" | {
err=
-- 
1.7.11.rc2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html