Re: [PATCH] git-svn: shorten glob error message
Hello all, On 01/14/2016 09:15 PM, Junio C Hamano wrote: > Eric Wong writes: > >> Error messages should attempt to fit within the confines of >> an 80-column terminal to avoid compatibility and accessibility >> problems. Furthermore the word "directories" can be misleading >> when used in the context of git refnames. >> >> Signed-off-by: Eric Wong >> --- >>Eric Wong wrote: >>> I also noticed the "Only one set of wildcard directories" error >>> message is unnecessary long and "wildcard directories" should >>> probably be shortened to "wildcards" to avoid wrapping in a terminal. >>> That will probably be a separate patch for me. >> >>There's likely more instances of this in git-svn, but I figured >>we'll get this one fixed, first. >> >>Also pushed to bogomips.org/git-svn.git >>(commit dc6aa7e61e9d33856f54d63b7acb518383420373) >>along with Victor's patch. > Thanks. > > I am not sure if it is a good idea to show */*/* as an example in > the message (that is an anti-example of 'one set of wildcard' by > having three stars, isn't it?), but that is not a new issue this > change introduces. I agree, this should be changed, however I think this should be done in separate patch. Do we have any questions left open before this could be merged into main git repo? -- Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] git-svn: add support for prefixed globs in config
Introduce prefixed globs for branches and tags in git-svn. Globs like 'release_*' allow users to avoid long lines in config like: branches = branches/{release_20,release_21,release_22,...} Signed-off-by: Victor Leschuk --- Changes from v3: * Wrapped all test preparations in separate test-cases Documentation/git-svn.txt| 5 ++ perl/Git/SVN/GlobSpec.pm | 9 ++- t/t9168-git-svn-prefixed-glob.sh | 142 +++ 3 files changed, 151 insertions(+), 5 deletions(-) create mode 100755 t/t9168-git-svn-prefixed-glob.sh diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt index 0c0f60b..529cffe 100644 --- a/Documentation/git-svn.txt +++ b/Documentation/git-svn.txt @@ -1034,6 +1034,7 @@ listed below are allowed: url = http://server.org/svn fetch = trunk/project-a:refs/remotes/project-a/trunk branches = branches/*/project-a:refs/remotes/project-a/branches/* + branches = branches/release_*:refs/remotes/project-a/branches/release_* tags = tags/*/project-a:refs/remotes/project-a/tags/* @@ -1044,6 +1045,10 @@ independent path component (surrounded by '/' or EOL). This type of configuration is not automatically created by 'init' and should be manually entered with a text-editor or using 'git config'. +Also note that prefixed globs (e.g. 'release_*') match everything after prefix +but do not match exact prefix. For example: +'release_*' will match 'release_1' or 'release_v1' but will not match 'release_'. + It is also possible to fetch a subset of branches or tags by using a comma-separated list of names within braces. For example: diff --git a/perl/Git/SVN/GlobSpec.pm b/perl/Git/SVN/GlobSpec.pm index c95f5d7..a136090 100644 --- a/perl/Git/SVN/GlobSpec.pm +++ b/perl/Git/SVN/GlobSpec.pm @@ -11,16 +11,15 @@ sub new { my $die_msg = "Only one set of wildcard directories " . "(e.g. '*' or '*/*/*') is supported: '$glob'\n"; for my $part (split(m|/|, $glob)) { - if ($part =~ /\*/ && $part ne "*") { - die "Invalid pattern in '$glob': $part\n"; - } elsif ($pattern_ok && $part =~ /[{}]/ && + if ($pattern_ok && $part =~ /[{}]/ && $part !~ /^\{[^{}]+\}/) { die "Invalid pattern in '$glob': $part\n"; } - if ($part eq "*") { + if ($part =~ /(\w*)\*/) { die $die_msg if $state eq "right"; $state = "pattern"; - push(@patterns, "[^/]*"); + my $pat = $1 ? "${1}[^/]+" : "[^/]*"; + push(@patterns, $pat); } elsif ($pattern_ok && $part =~ /^\{(.*)\}$/) { die $die_msg if $state eq "right"; $state = "pattern"; diff --git a/t/t9168-git-svn-prefixed-glob.sh b/t/t9168-git-svn-prefixed-glob.sh new file mode 100755 index 000..1b08e45 --- /dev/null +++ b/t/t9168-git-svn-prefixed-glob.sh @@ -0,0 +1,142 @@ +#!/bin/sh +test_description='git svn globbing refspecs with prefixed globs' +. ./lib-git-svn.sh + +test_expect_success 'prepare test refspec prefixed globbing' ' + cat >expect.end <trunk/src/a/readme && + echo "goodbye world" >trunk/src/b/readme && + svn_cmd import -m "initial" trunk "$svnrepo"/trunk && + svn_cmd co "$svnrepo" tmp && + ( + cd tmp && + mkdir branches tags && + svn_cmd add branches tags && + svn_cmd cp trunk branches/b_start && + svn_cmd commit -m "start a new branch" && + svn_cmd up && + echo "hi" >>branches/b_start/src/b/readme && + poke branches/b_start/src/b/readme && + echo "hey" >>branches/b_start/src/a/readme && + poke branches/b_start/src/a/readme && + svn_cmd commit -m "hi" && + svn_cmd up && + svn_cmd cp branches/b_start tags/t_end && + echo "bye" >>tags/t_end/src/b/readme && + poke tags/t_end/src/b/readme && + echo "aye" >>tags/t_end/src/a/readme && + pok
[PATCH v3] git-svn: add support for prefixed globs in config
Introduce prefixed globs for branches and tags in git-svn. Globs like 'release_*' allow users to avoid long lines in config like: branches = branches/{release_20,release_21,release_22,...} Signed-off-by: Victor Leschuk --- Changes from v1 (in v2 I forgot to switch from `` to $() ): * Joined implementation and test in one patch * Fixed test code style according to current coding style guide Documentation/git-svn.txt| 5 ++ perl/Git/SVN/GlobSpec.pm | 9 ++- t/t9168-git-svn-prefixed-glob.sh | 136 +++ 3 files changed, 145 insertions(+), 5 deletions(-) create mode 100755 t/t9168-git-svn-prefixed-glob.sh diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt index 0c0f60b..529cffe 100644 --- a/Documentation/git-svn.txt +++ b/Documentation/git-svn.txt @@ -1034,6 +1034,7 @@ listed below are allowed: url = http://server.org/svn fetch = trunk/project-a:refs/remotes/project-a/trunk branches = branches/*/project-a:refs/remotes/project-a/branches/* + branches = branches/release_*:refs/remotes/project-a/branches/release_* tags = tags/*/project-a:refs/remotes/project-a/tags/* @@ -1044,6 +1045,10 @@ independent path component (surrounded by '/' or EOL). This type of configuration is not automatically created by 'init' and should be manually entered with a text-editor or using 'git config'. +Also note that prefixed globs (e.g. 'release_*') match everything after prefix +but do not match exact prefix. For example: +'release_*' will match 'release_1' or 'release_v1' but will not match 'release_'. + It is also possible to fetch a subset of branches or tags by using a comma-separated list of names within braces. For example: diff --git a/perl/Git/SVN/GlobSpec.pm b/perl/Git/SVN/GlobSpec.pm index c95f5d7..a136090 100644 --- a/perl/Git/SVN/GlobSpec.pm +++ b/perl/Git/SVN/GlobSpec.pm @@ -11,16 +11,15 @@ sub new { my $die_msg = "Only one set of wildcard directories " . "(e.g. '*' or '*/*/*') is supported: '$glob'\n"; for my $part (split(m|/|, $glob)) { - if ($part =~ /\*/ && $part ne "*") { - die "Invalid pattern in '$glob': $part\n"; - } elsif ($pattern_ok && $part =~ /[{}]/ && + if ($pattern_ok && $part =~ /[{}]/ && $part !~ /^\{[^{}]+\}/) { die "Invalid pattern in '$glob': $part\n"; } - if ($part eq "*") { + if ($part =~ /(\w*)\*/) { die $die_msg if $state eq "right"; $state = "pattern"; - push(@patterns, "[^/]*"); + my $pat = $1 ? "${1}[^/]+" : "[^/]*"; + push(@patterns, $pat); } elsif ($pattern_ok && $part =~ /^\{(.*)\}$/) { die $die_msg if $state eq "right"; $state = "pattern"; diff --git a/t/t9168-git-svn-prefixed-glob.sh b/t/t9168-git-svn-prefixed-glob.sh new file mode 100755 index 000..a5a3227 --- /dev/null +++ b/t/t9168-git-svn-prefixed-glob.sh @@ -0,0 +1,136 @@ +#!/bin/sh +test_description='git svn globbing refspecs with prefixed globs' +. ./lib-git-svn.sh + +cat >expect.end <trunk/src/a/readme && + echo "goodbye world" >trunk/src/b/readme && + svn_cmd import -m "initial" trunk "$svnrepo"/trunk && + svn_cmd co "$svnrepo" tmp && + ( + cd tmp && + mkdir branches tags && + svn_cmd add branches tags && + svn_cmd cp trunk branches/b_start && + svn_cmd commit -m "start a new branch" && + svn_cmd up && + echo "hi" >>branches/b_start/src/b/readme && + poke branches/b_start/src/b/readme && + echo "hey" >>branches/b_start/src/a/readme && + poke branches/b_start/src/a/readme && + svn_cmd commit -m "hi" && + svn_cmd up && + svn_cmd cp branches/b_start tags/t_end && + echo "bye" >>tags/t_end/src/b/readme && + poke tags/t_end/src/b/readme && + echo "aye" >>tags/t_end/src/a/readm
[PATCH v2] git-svn: add support for prefixed globs in config
Introduce prefixed globs for branches and tags in git-svn. Globs like 'release_*' allow users to avoid long lines in config like: branches = branches/{release_20,release_21,release_22,...} Signed-off-by: Victor Leschuk --- Changes from v1: * Joined implementation and test in one patch * Fixed test code style according to current coding style guide Documentation/git-svn.txt| 5 ++ perl/Git/SVN/GlobSpec.pm | 9 ++- t/t9168-git-svn-prefixed-glob.sh | 136 +++ 3 files changed, 145 insertions(+), 5 deletions(-) create mode 100755 t/t9168-git-svn-prefixed-glob.sh diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt index 0c0f60b..529cffe 100644 --- a/Documentation/git-svn.txt +++ b/Documentation/git-svn.txt @@ -1034,6 +1034,7 @@ listed below are allowed: url = http://server.org/svn fetch = trunk/project-a:refs/remotes/project-a/trunk branches = branches/*/project-a:refs/remotes/project-a/branches/* + branches = branches/release_*:refs/remotes/project-a/branches/release_* tags = tags/*/project-a:refs/remotes/project-a/tags/* @@ -1044,6 +1045,10 @@ independent path component (surrounded by '/' or EOL). This type of configuration is not automatically created by 'init' and should be manually entered with a text-editor or using 'git config'. +Also note that prefixed globs (e.g. 'release_*') match everything after prefix +but do not match exact prefix. For example: +'release_*' will match 'release_1' or 'release_v1' but will not match 'release_'. + It is also possible to fetch a subset of branches or tags by using a comma-separated list of names within braces. For example: diff --git a/perl/Git/SVN/GlobSpec.pm b/perl/Git/SVN/GlobSpec.pm index c95f5d7..a136090 100644 --- a/perl/Git/SVN/GlobSpec.pm +++ b/perl/Git/SVN/GlobSpec.pm @@ -11,16 +11,15 @@ sub new { my $die_msg = "Only one set of wildcard directories " . "(e.g. '*' or '*/*/*') is supported: '$glob'\n"; for my $part (split(m|/|, $glob)) { - if ($part =~ /\*/ && $part ne "*") { - die "Invalid pattern in '$glob': $part\n"; - } elsif ($pattern_ok && $part =~ /[{}]/ && + if ($pattern_ok && $part =~ /[{}]/ && $part !~ /^\{[^{}]+\}/) { die "Invalid pattern in '$glob': $part\n"; } - if ($part eq "*") { + if ($part =~ /(\w*)\*/) { die $die_msg if $state eq "right"; $state = "pattern"; - push(@patterns, "[^/]*"); + my $pat = $1 ? "${1}[^/]+" : "[^/]*"; + push(@patterns, $pat); } elsif ($pattern_ok && $part =~ /^\{(.*)\}$/) { die $die_msg if $state eq "right"; $state = "pattern"; diff --git a/t/t9168-git-svn-prefixed-glob.sh b/t/t9168-git-svn-prefixed-glob.sh new file mode 100755 index 000..b8a059b --- /dev/null +++ b/t/t9168-git-svn-prefixed-glob.sh @@ -0,0 +1,136 @@ +#!/bin/sh +test_description='git svn globbing refspecs with prefixed globs' +. ./lib-git-svn.sh + +cat >expect.end <trunk/src/a/readme && + echo "goodbye world" >trunk/src/b/readme && + svn_cmd import -m "initial" trunk "$svnrepo"/trunk && + svn_cmd co "$svnrepo" tmp && + ( + cd tmp && + mkdir branches tags && + svn_cmd add branches tags && + svn_cmd cp trunk branches/b_start && + svn_cmd commit -m "start a new branch" && + svn_cmd up && + echo "hi" >>branches/b_start/src/b/readme && + poke branches/b_start/src/b/readme && + echo "hey" >>branches/b_start/src/a/readme && + poke branches/b_start/src/a/readme && + svn_cmd commit -m "hi" && + svn_cmd up && + svn_cmd cp branches/b_start tags/t_end && + echo "bye" >>tags/t_end/src/b/readme && + poke tags/t_end/src/b/readme && + echo "aye" >>tags/t_end/src/a/readme && + poke tags/t_end/src/a/readme &a
Re: [PATCH 2/2] git-svn: test for git-svn prefixed globs
Hello Eric, sorry, I just copy pasted your old t9108-git-svn-glob.sh, changed branch names to be prefixed and added test for "exact" prefix match. If it is necessary I can rewrite it according to current guidelines. On 12/17/2015 12:28 AM, Eric Wong wrote: Thanks for this work. Most things look fine with 1/2, comments on 2/2 below... Victor Leschuk wrote: Add test for git-svn prefixed globs. Why a separate patch? Unless there's some documentation purpose for a regression, usually tests and a feature should be added atomically in the same commit. --- /dev/null +++ b/t/t9168-git-svn-prefixed-glob.sh @@ -0,0 +1,136 @@ +#!/bin/sh +test_description='git svn globbing refspecs with prefixed globs' +. ./lib-git-svn.sh + +cat > expect.end < We prefer redirects in new code to be in the form of ">foo" (no space) (or ">>foo" for append). It wasn't in the old tests, either, but Documentation/CodingGuidelines favors this for new code. +the end +hi +start a new branch +initial +EOF All the setup code be checked for errors with '&&' as well. + test "`git rev-parse refs/remotes/tags/t_end~1`" = \ + "`git rev-parse refs/remotes/branches/b_start`" && + test "`git rev-parse refs/remotes/branches/b_start~2`" = \ + "`git rev-parse refs/remotes/trunk`" && And we prefer $(command) instead of `command` for nestability as Documentation/CodingGuidelines suggests. (yeah, most of the old tests don't follow the guidelines, but the guidelines also warn against fixup patches for them). Thanks again. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] git-svn: test for git-svn prefixed globs
Add test for git-svn prefixed globs. Signed-off-by: Victor Leschuk --- t/t9168-git-svn-prefixed-glob.sh | 136 +++ 1 file changed, 136 insertions(+) create mode 100755 t/t9168-git-svn-prefixed-glob.sh diff --git a/t/t9168-git-svn-prefixed-glob.sh b/t/t9168-git-svn-prefixed-glob.sh new file mode 100755 index 000..979ecd9 --- /dev/null +++ b/t/t9168-git-svn-prefixed-glob.sh @@ -0,0 +1,136 @@ +#!/bin/sh +test_description='git svn globbing refspecs with prefixed globs' +. ./lib-git-svn.sh + +cat > expect.end < trunk/src/a/readme && + echo "goodbye world" > trunk/src/b/readme && + svn_cmd import -m "initial" trunk "$svnrepo"/trunk && + svn_cmd co "$svnrepo" tmp && + ( + cd tmp && + mkdir branches tags && + svn_cmd add branches tags && + svn_cmd cp trunk branches/b_start && + svn_cmd commit -m "start a new branch" && + svn_cmd up && + echo "hi" >> branches/b_start/src/b/readme && + poke branches/b_start/src/b/readme && + echo "hey" >> branches/b_start/src/a/readme && + poke branches/b_start/src/a/readme && + svn_cmd commit -m "hi" && + svn_cmd up && + svn_cmd cp branches/b_start tags/t_end && + echo "bye" >> tags/t_end/src/b/readme && + poke tags/t_end/src/b/readme && + echo "aye" >> tags/t_end/src/a/readme && + poke tags/t_end/src/a/readme && + svn_cmd commit -m "the end" && + echo "byebye" >> tags/t_end/src/b/readme && + poke tags/t_end/src/b/readme && + svn_cmd commit -m "nothing to see here" + ) && + git config --add svn-remote.svn.url "$svnrepo" && + git config --add svn-remote.svn.fetch \ +"trunk/src/a:refs/remotes/trunk" && + git config --add svn-remote.svn.branches \ +"branches/b_*/src/a:refs/remotes/branches/b_*" && + git config --add svn-remote.svn.tags\ +"tags/t_*/src/a:refs/remotes/tags/t_*" && + git svn multi-fetch && + git log --pretty=oneline refs/remotes/tags/t_end | \ + sed -e "s/^.\{41\}//" > output.end && + test_cmp expect.end output.end && + test "`git rev-parse refs/remotes/tags/t_end~1`" = \ + "`git rev-parse refs/remotes/branches/b_start`" && + test "`git rev-parse refs/remotes/branches/b_start~2`" = \ + "`git rev-parse refs/remotes/trunk`" && + test_must_fail git rev-parse refs/remotes/tags/t_end@3 + ' + +echo try to try > expect.two +echo nothing to see here >> expect.two +cat expect.end >> expect.two + +test_expect_success 'test left-hand-side only prefixed globbing' ' + git config --add svn-remote.two.url "$svnrepo" && + git config --add svn-remote.two.fetch trunk:refs/remotes/two/trunk && + git config --add svn-remote.two.branches \ +"branches/b_*:refs/remotes/two/branches/*" && + git config --add svn-remote.two.tags \ +"tags/t_*:refs/remotes/two/tags/*" && + ( + cd tmp && + echo "try try" >> tags/t_end/src/b/readme && + poke tags/t_end/src/b/readme && + svn_cmd commit -m "try to try" + ) && + git svn fetch two && + test `git rev-list refs/remotes/two/tags/t_end | wc -l` -eq 6 && + test `git rev-list refs/remotes/two/branches/b_start | wc -l` -eq 3 && + test `git rev-parse refs/remotes/two/branches/b_start~2` = \ +`git rev-parse refs/remotes/two/trunk` && + test `git rev-parse refs/remotes/two/tags/t_end~3` = \ +`git rev-parse refs/remotes/two/branches/b_start` && + git log --pretty=oneline refs/remotes/two/tags/t_end | \ + sed -e "s/^.\{41\}//" > output.two && + test_cmp expect.two output.two + ' + +test_expect_success 'test prefixed globs do not match just prefix' ' + git config --add svn-remote.three.url "$svnrepo" && + git config
[PATCH 1/2] git-svn: support for prefixed globs in config
Introduce prefixed globs for branches and tags in git-svn. Signed-off-by: Victor Leschuk --- Documentation/git-svn.txt | 5 + perl/Git/SVN/GlobSpec.pm | 9 - 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt index 0c0f60b..529cffe 100644 --- a/Documentation/git-svn.txt +++ b/Documentation/git-svn.txt @@ -1034,6 +1034,7 @@ listed below are allowed: url = http://server.org/svn fetch = trunk/project-a:refs/remotes/project-a/trunk branches = branches/*/project-a:refs/remotes/project-a/branches/* + branches = branches/release_*:refs/remotes/project-a/branches/release_* tags = tags/*/project-a:refs/remotes/project-a/tags/* @@ -1044,6 +1045,10 @@ independent path component (surrounded by '/' or EOL). This type of configuration is not automatically created by 'init' and should be manually entered with a text-editor or using 'git config'. +Also note that prefixed globs (e.g. 'release_*') match everything after prefix +but do not match exact prefix. For example: +'release_*' will match 'release_1' or 'release_v1' but will not match 'release_'. + It is also possible to fetch a subset of branches or tags by using a comma-separated list of names within braces. For example: diff --git a/perl/Git/SVN/GlobSpec.pm b/perl/Git/SVN/GlobSpec.pm index c95f5d7..a136090 100644 --- a/perl/Git/SVN/GlobSpec.pm +++ b/perl/Git/SVN/GlobSpec.pm @@ -11,16 +11,15 @@ sub new { my $die_msg = "Only one set of wildcard directories " . "(e.g. '*' or '*/*/*') is supported: '$glob'\n"; for my $part (split(m|/|, $glob)) { - if ($part =~ /\*/ && $part ne "*") { - die "Invalid pattern in '$glob': $part\n"; - } elsif ($pattern_ok && $part =~ /[{}]/ && + if ($pattern_ok && $part =~ /[{}]/ && $part !~ /^\{[^{}]+\}/) { die "Invalid pattern in '$glob': $part\n"; } - if ($part eq "*") { + if ($part =~ /(\w*)\*/) { die $die_msg if $state eq "right"; $state = "pattern"; - push(@patterns, "[^/]*"); + my $pat = $1 ? "${1}[^/]+" : "[^/]*"; + push(@patterns, $pat); } elsif ($pattern_ok && $part =~ /^\{(.*)\}$/) { die $die_msg if $state eq "right"; $state = "pattern"; -- 2.7.0.rc0.21.gb793f61 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] git-svn: add support for prefixed globs in config
There are existing old SVN repos which use patterns in branch (and tag) names to indicate some information. For example: branches/release_01, branches/release_02, etc, however non-patterned branches co-exist with them (like branches/dev). If someone maintains git mirror of such a repo it is reasonable to mirror only several branches (for example those who match some pattern) and in current situation it leads to messy and error-prone git config like: branches = branches/{release_20,release_21,release_22,... It would be useful to have an opportunity to write branches = branches/release_* instead of this. Thus I suggest to add support for such 'prefixed' globs into git-svn. Victor Leschuk (2): Introduce prefixed globs for branches and tags in git-svn. Add test for git-svn prefixed globs. Documentation/git-svn.txt| 5 ++ perl/Git/SVN/GlobSpec.pm | 9 ++- t/t9168-git-svn-prefixed-glob.sh | 136 +++ 3 files changed, 145 insertions(+), 5 deletions(-) create mode 100755 t/t9168-git-svn-prefixed-glob.sh -- 2.7.0.rc0.21.gb793f61 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Introduce grep threads param
Hello Junio, On 12/15/2015 11:06 PM, Junio C Hamano wrote: Victor Leschuk writes: Subject: Re: [PATCH 1/2] Introduce grep threads param I'll retitle this to something like grep: add --threads= option and grep.threads configuration while queuing (which I did for v7 earlier). Ok, thanks. "git grep" can now be configured (or told from the command line) how many threads to use when searching in the working tree files. Signed-off-by: Victor Leschuk --- ... +grep.threads:: + Number of grep worker threads. "Number of grep worker threads to use"? Accepted, will change. + See `grep.threads` in linkgit:git-grep[1] for more information. ... +grep.threads:: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which is using 8 threads for all systems. + Default behavior may change in future versions + to better suit hardware and circumstances. The last sentence is too noisy. Perhaps drop it and phrase it like this instead? grep.threads:: Number of grep worker threads to use. If unset (or set to 0), to 0), 8 threads are used by default (for now). Agree. diff --git a/builtin/grep.c b/builtin/grep.c index 4229cae..e9aebab 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -24,11 +24,11 @@ static char const * const grep_usage[] = { NULL }; -static int use_threads = 1; +#define GREP_NUM_THREADS_DEFAULT 8 +static int num_threads = 0; Please do not initialize static to 0 (or NULL). Ok. @@ -267,6 +270,12 @@ static int grep_cmd_config(const char *var, const char *value, void *cb) int st = grep_config(var, value, cb); if (git_color_default_config(var, value, cb) < 0) st = -1; + + if (!strcmp(var, "grep.threads")) { + /* Sanity check of value will be perfomed later */ Hmm, is that a good design? A user may hear "invalid number of threads specified (-4)" later, but if that came from "grep.threads", especially when the user did not say "--threads=-4" from the command line, would she know to check her configuration file? If she had "grep.threads=Yes" in her configuration, we would helpfully tell her that 'Yes' given to grep.threads is not a valid integer. Shouldn't we do the same for '-4' given to grep.threads, too? if (!strcmp(var, "grep.threads")) { num_threads = git_config_int(var, value); if (num_threads < 0) die(_("invalid number of threads specified (%d) for %s"), num_threads, var); } perhaps. When I was doing this I looked at other code and saw that exact message "Invalid number of threads..." is used in other parts of git and is present in 'po' translations. Thus I decided to use exactly the same message in order not to create numerous almost similar localizations (which we should do if we use two different messages in different places). Do you think we need to create two more different messages (and translations, I can prepare Russian and French) and create two checks for cmd and config? @@ -817,14 +827,23 @@ int cmd_grep(int argc, const char **argv, const char *prefix) } #ifndef NO_PTHREADS - if (list.nr || cached || online_cpus() == 1) - use_threads = 0; + if (list.nr || cached || online_cpus() == 1 || show_in_pager) { + /* Can not multi-thread object lookup */ + num_threads = 0; Removing 'use_threads = 0' from an earlier part and moving the check to show_in_pager is a good idea, but it invalidates this comment. The earlier three (actually two and a half) are "cannot" cases, i.e. the object layer is not easily threaded without locking, and when you have a single core, you do not truly run multiple operations at the same time, but as [PATCH 2/2] does, threading in "grep" is not about CPU alone, so that is why I am demoting it to just a half ;-). But show_in_pager is "we do not want to", I think. In any case, this comment and "User didn't specify" below are not telling the reader something very much useful. You probably should remove them. Ok, will remove comments. + } + else if (num_threads == 0) { + /* User didn't specify value, or just wants default behavior */ + num_threads = GREP_NUM_THREADS_DEFAULT; + } + else if (num_threads < 0) { + die(_("invalid number of threads specified (%d)"), num_threads); + } Many unnecessary braces. I put braces to make code look more unified. I had to put braces here: + else if (num_threads == 0) { + /* User didn't specify
[PATCH 1/2] Introduce grep threads param
"git grep" can now be configured (or told from the command line) how many threads to use when searching in the working tree files. Signed-off-by: Victor Leschuk --- Documentation/config.txt | 4 +++ Documentation/git-grep.txt | 12 + builtin/grep.c | 49 +++--- contrib/completion/git-completion.bash | 1 + 4 files changed, 51 insertions(+), 15 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 2d06b11..cbf4071 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1450,6 +1450,10 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads. + See `grep.threads` in linkgit:git-grep[1] for more information. + gpg.program:: Use this custom program instead of "gpg" found on $PATH when making or verifying a PGP signature. The program must support the diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a44d6d..25e6dc5 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -23,6 +23,7 @@ SYNOPSIS [--break] [--heading] [-p | --show-function] [-A ] [-B ] [-C ] [-W | --function-context] + [--threads ] [-f ] [-e] [--and|--or|--not|(|)|-e ...] [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | ...] @@ -53,6 +54,13 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which is using 8 threads for all systems. + Default behavior may change in future versions + to better suit hardware and circumstances. + grep.fullName:: If set to true, enable '--full-name' option by default. @@ -227,6 +235,10 @@ OPTIONS effectively showing the whole function in which the match was found. +--threads :: + Number of grep worker threads. + See `grep.threads` in 'CONFIGURATION' for more information. + -f :: Read patterns from , one per line. diff --git a/builtin/grep.c b/builtin/grep.c index 4229cae..e9aebab 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -24,11 +24,11 @@ static char const * const grep_usage[] = { NULL }; -static int use_threads = 1; +#define GREP_NUM_THREADS_DEFAULT 8 +static int num_threads = 0; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread_t threads[THREADS]; +static pthread_t *threads; /* We use one producer thread and THREADS consumer * threads. The producer adds struct work_items to 'todo' and the @@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex; static inline void grep_lock(void) { - if (use_threads) + if (num_threads) pthread_mutex_lock(&grep_mutex); } static inline void grep_unlock(void) { - if (use_threads) + if (num_threads) pthread_mutex_unlock(&grep_mutex); } @@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt) strbuf_init(&todo[i].out, 0); } - for (i = 0; i < ARRAY_SIZE(threads); i++) { + threads = xcalloc(num_threads, sizeof(*threads)); + for (i = 0; i < num_threads; i++) { int err; struct grep_opt *o = grep_opt_dup(opt); o->output = strbuf_out; @@ -238,12 +239,14 @@ static int wait_all(void) pthread_cond_broadcast(&cond_add); grep_unlock(); - for (i = 0; i < ARRAY_SIZE(threads); i++) { + for (i = 0; i < num_threads; i++) { void *h; pthread_join(threads[i], &h); hit |= (int) (intptr_t) h; } + free(threads); + pthread_mutex_destroy(&grep_mutex); pthread_mutex_destroy(&grep_read_mutex); pthread_mutex_destroy(&grep_attr_mutex); @@ -267,6 +270,12 @@ static int grep_cmd_config(const char *var, const char *value, void *cb) int st = grep_config(var, value, cb); if (git_color_default_config(var, value, cb) < 0) st = -1; + + if (!strcmp(var, "grep.threads")) { + /* Sanity check of value will be perfomed later */ + num_threads = git_config_int(var, value); + } + return st; } @@ -294,7 +303,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1, } #ifndef NO_PTHREADS - if (use_threads) { + if (num_threads) { add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, path, sha1);
[PATCH v8 0/2] Add git-grep threads param
Introducing v8 of git-grep threads param patch. Patch is now split in 2 parts - 1/2 is actually renewed v6 version (see list of changes below), 2/2 removes dependency on online_cpus() - as we discussed with Eric this is rather significant change in default behavior and should be placed into separate patch. Here is list of changes since v6 ($gmane/281160): * Fixed broken t7811: moved all threads_num setup to 1 place (for -O option it was in wrong place) * Fixed 'invalid number of threads' message so that it could be translated * Got rid of grep_threads_config() - its too trivial to be separate function * Fixed xcalloc() args (sizeof(pthread_t) -> sizeof(*threads)) to correspond to general git style * Improved commit message (in 2/2) to explain why online_cpus() is now not used in threads_num setup * The full param documentation was moved into single place (grep.threads description in git-grep.txt) and is referenced from other places. Also made few language improvements in documentation. * Style improvements: splitted too long lines Victor Leschuk (2): "git grep" can now be configured (or told from the command line) how many threads to use when searching in the working tree files. Number of threads now doesn't depend on online_cpus(), e.g. if specific number is not configured GREP_NUM_THREADS_DEFAULT (8) threads will be used even on 1-core CPU. Documentation/config.txt | 4 +++ Documentation/git-grep.txt | 12 + builtin/grep.c | 49 +++--- contrib/completion/git-completion.bash | 1 + 4 files changed, 51 insertions(+), 15 deletions(-) -- 2.6.3.369.g3e7f205.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Get rid of online_cpus() when determining grep threads num
Number of threads now doesn't depend on online_cpus(), e.g. if specific number is not configured GREP_NUM_THREADS_DEFAULT (8) threads will be used even on 1-core CPU. Reason: multithreading can improve performance even on single core machines as IO is also a major factor here. Using multiple threads can significantly boost grep performance when working on slow filesystems (or repo isn't cached) or through network (for example repo is located on NFS). Signed-off-by: Victor Leschuk --- builtin/grep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/grep.c b/builtin/grep.c index e9aebab..1315905 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -827,7 +827,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) } #ifndef NO_PTHREADS - if (list.nr || cached || online_cpus() == 1 || show_in_pager) { + if (list.nr || cached || show_in_pager) { /* Can not multi-thread object lookup */ num_threads = 0; } -- 2.6.3.369.g3e7f205.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7] Add git-grep threads param
Hello Eric: Hmm, v7 doesn't seem to address any of the v6 review comments here[1]. Was that review merely overlooked or did you disagree with the reviewer? [1]: http://article.gmane.org/gmane.comp.version-control.git/281817 Sorry, it looks like I missed that letter. My bad. History of changes from the first version ($gmane/280053/: It's generally more helpful to reviewers if you provide a link to the previous version and describe the changes since that version. (If you prefer, it's also okay to describe all changes, but they should be organized by version so that it's obvious which changes were made in which version.) I will prepare v8 and list changes starting from v6. Please kindly disregard this one. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7] Add git-grep threads param
"git grep" can now be configured (or told from the command line) how many threads to use when searching in the working tree files. Changes to default behavior: number of threads now doesn't depend on online_cpus(), e.g. if specific number is not configured GREP_NUM_THREADS_DEFAULT (8) threads will be used even on 1-core CPU. Reason: multithreading can improve performance even on single core machines as IO is also a major factor here. Using multiple threads can significantly boost grep performance when working on slow filesystems (or repo isn't cached) or through network (for example repo is located on NFS). Signed-off-by: Victor Leschuk --- History of changes from the first version ($gmane/280053/: * Param renamed from threads-num to threads * Short version of '--threads' cmd key was removed * Made num_threads 'decision-tree' more obvious and easy to edit for future use ($gmane/280089) * Moved option description to more suitable place in documentation ($gmane/280188) * Hid threads param from 'external' grep.c, made it private for builtin/grep.c ($gmane/280188) * Improved num_threads 'decision-tree', got rid of dependency on online_cpus ($gmane/280299) * Improved param documentation ($gmane/280299) * Fixed broken t7811: moved all threads_num setup to 1 place (for -O option it was in wrong place) ($gmane/281160) * Fixed 'invalid number of threads' message so that it could be translated ($gmane/281160) * Got rid of grep_threads_config() - its too trivial to be separate function ($gmane/281160) * Fixed xcalloc() args (sizeof(pthread_t) -> sizeof(*threads)) to correspond to general git style ($gmane/281160) * Improved commit message to explain why online_cpus() is now not used in threads_num setup ($gmane/281160) Documentation/config.txt | 7 ++ Documentation/git-grep.txt | 15 builtin/grep.c | 42 ++ contrib/completion/git-completion.bash | 1 + 4 files changed, 50 insertions(+), 15 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 2d06b11..687b9ad 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1450,6 +1450,13 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suit hardware and circumstances. + gpg.program:: Use this custom program instead of "gpg" found on $PATH when making or verifying a PGP signature. The program must support the diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a44d6d..8222a83 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -23,6 +23,7 @@ SYNOPSIS [--break] [--heading] [-p | --show-function] [-A ] [-B ] [-C ] [-W | --function-context] + [--threads ] [-f ] [-e] [--and|--or|--not|(|)|-e ...] [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | ...] @@ -53,6 +54,13 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suit hardware and circumstances. + grep.fullName:: If set to true, enable '--full-name' option by default. @@ -227,6 +235,13 @@ OPTIONS effectively showing the whole function in which the match was found. +--threads :: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suit hardware and circumstances. + -f :: Read patterns from , one per line. diff --git a/builtin/grep.c b/builtin/grep.c index 4229cae..756b6af 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -24,11 +24,11 @@ static char const * const grep_usage[] = { NULL }; -static int use_threads = 1; +#define GREP_NUM_THREADS_DEFAULT 8 +static int num_threads = 0; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread
RE: [PATCH v6] Add git-grep threads param
Hello all, does anybody have time to review this patch? PS Sorry for bothering =) -- Best Regards, Victor From: Victor Leschuk [vlesc...@gmail.com] Sent: Wednesday, November 11, 2015 03:52 To: git@vger.kernel.org Cc: Victor Leschuk Subject: [PATCH v6] Add git-grep threads param "git grep" can now be configured (or told from the command line) how many threads to use when searching in the working tree files. Changes to default behavior: number of threads now doesn't depend on online_cpus(), e.g. if specific number is not configured GREP_NUM_THREADS_DEFAULT (8) threads will be used even on 1-core CPU. Signed-off-by: Victor Leschuk --- History of changes from the first version ($gmane/280053/: * Param renamed from threads-num to threads * Short version of '--threads' cmd key was removed * Made num_threads 'decision-tree' more obvious and easy to edit for future use ($gmane/280089) * Moved option description to more suitable place in documentation ($gmane/280188) * Hid threads param from 'external' grep.c, made it private for builtin/grep.c ($gmane/280188) * Improved num_threads 'decision-tree', got rid of dependency on online_cpus ($gmane/280299) * Improved param documentation ($gmane/280299) Documentation/config.txt | 7 + Documentation/git-grep.txt | 15 ++ builtin/grep.c | 50 +++--- contrib/completion/git-completion.bash | 1 + 4 files changed, 57 insertions(+), 16 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 391a0c3..5084e36 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1447,6 +1447,13 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suit hardware and circumstances. + gpg.program:: Use this custom program instead of "gpg" found on $PATH when making or verifying a PGP signature. The program must support the diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a44d6d..8222a83 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -23,6 +23,7 @@ SYNOPSIS [--break] [--heading] [-p | --show-function] [-A ] [-B ] [-C ] [-W | --function-context] + [--threads ] [-f ] [-e] [--and|--or|--not|(|)|-e ...] [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | ...] @@ -53,6 +54,13 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suit hardware and circumstances. + grep.fullName:: If set to true, enable '--full-name' option by default. @@ -227,6 +235,13 @@ OPTIONS effectively showing the whole function in which the match was found. +--threads :: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suit hardware and circumstances. + -f :: Read patterns from , one per line. diff --git a/builtin/grep.c b/builtin/grep.c index d04f440..f0e3dfb 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -24,11 +24,11 @@ static char const * const grep_usage[] = { NULL }; -static int use_threads = 1; +#define GREP_NUM_THREADS_DEFAULT 8 +static int num_threads = 0; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread_t threads[THREADS]; +static pthread_t *threads; /* We use one producer thread and THREADS consumer * threads. The producer adds struct work_items to 'todo' and the @@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex; static inline void grep_lock(void) { - if (use_threads) + if (num_threads) pthread_mutex_lock(&grep_mutex); } static inline void grep_unlock(void) { - if (use_threads) + if (num_threads) pthread_mutex_unlock(&grep_mutex); } @@ -206,7 +206,8 @@ static void st
Re: What's cooking in git.git (Nov 2015, #03; Fri, 20)
On 11/20/2015 05:09 PM, Jeff King wrote: * vl/grep-configurable-threads (2015-11-01) 1 commit - grep: add --threads= option and grep.threads configuration "git grep" can now be configured (or told from the command line) how many threads to use when searching in the working tree files. I haven't reviewed v6 yet. More eyes are welcome. Actually v6 includes only changes to special meaning of "0" (0 is now default behavior - 8 threads), little corrections to documentation and getting rid of online_cpus() in decision-tree. All as discussed in comments for v4/5. -- Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v6] Add git-grep threads param
Hello all, The earlier version of this patch is already included in /pu branch, however as we all agreed ($gmane/280299) we have changed the default behavior and the meaning of "0". The question is: what is the right way to include changes from patch v6 (this one) into already merged patch to pu? Thanks. -- Best Regards, Victor ____ From: Victor Leschuk [vlesc...@gmail.com] Sent: Wednesday, November 11, 2015 03:52 To: git@vger.kernel.org Cc: Victor Leschuk Subject: [PATCH v6] Add git-grep threads param "git grep" can now be configured (or told from the command line) how many threads to use when searching in the working tree files. Changes to default behavior: number of threads now doesn't depend on online_cpus(), e.g. if specific number is not configured GREP_NUM_THREADS_DEFAULT (8) threads will be used even on 1-core CPU. Signed-off-by: Victor Leschuk --- History of changes from the first version ($gmane/280053/: * Param renamed from threads-num to threads * Short version of '--threads' cmd key was removed * Made num_threads 'decision-tree' more obvious and easy to edit for future use ($gmane/280089) * Moved option description to more suitable place in documentation ($gmane/280188) * Hid threads param from 'external' grep.c, made it private for builtin/grep.c ($gmane/280188) * Improved num_threads 'decision-tree', got rid of dependency on online_cpus ($gmane/280299) * Improved param documentation ($gmane/280299) Documentation/config.txt | 7 + Documentation/git-grep.txt | 15 ++ builtin/grep.c | 50 +++--- contrib/completion/git-completion.bash | 1 + 4 files changed, 57 insertions(+), 16 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 391a0c3..5084e36 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1447,6 +1447,13 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suit hardware and circumstances. + gpg.program:: Use this custom program instead of "gpg" found on $PATH when making or verifying a PGP signature. The program must support the diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a44d6d..8222a83 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -23,6 +23,7 @@ SYNOPSIS [--break] [--heading] [-p | --show-function] [-A ] [-B ] [-C ] [-W | --function-context] + [--threads ] [-f ] [-e] [--and|--or|--not|(|)|-e ...] [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | ...] @@ -53,6 +54,13 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suit hardware and circumstances. + grep.fullName:: If set to true, enable '--full-name' option by default. @@ -227,6 +235,13 @@ OPTIONS effectively showing the whole function in which the match was found. +--threads :: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suit hardware and circumstances. + -f :: Read patterns from , one per line. diff --git a/builtin/grep.c b/builtin/grep.c index d04f440..f0e3dfb 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -24,11 +24,11 @@ static char const * const grep_usage[] = { NULL }; -static int use_threads = 1; +#define GREP_NUM_THREADS_DEFAULT 8 +static int num_threads = 0; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread_t threads[THREADS]; +static pthread_t *threads; /* We use one producer thread and THREADS consumer * threads. The producer adds struct work_items to 'todo' and the @@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex; static inline void grep_lock(void) { - if (use_threads) + if (num_threads)
[PATCH v6] Add git-grep threads param
"git grep" can now be configured (or told from the command line) how many threads to use when searching in the working tree files. Changes to default behavior: number of threads now doesn't depend on online_cpus(), e.g. if specific number is not configured GREP_NUM_THREADS_DEFAULT (8) threads will be used even on 1-core CPU. Signed-off-by: Victor Leschuk --- History of changes from the first version ($gmane/280053/: * Param renamed from threads-num to threads * Short version of '--threads' cmd key was removed * Made num_threads 'decision-tree' more obvious and easy to edit for future use ($gmane/280089) * Moved option description to more suitable place in documentation ($gmane/280188) * Hid threads param from 'external' grep.c, made it private for builtin/grep.c ($gmane/280188) * Improved num_threads 'decision-tree', got rid of dependency on online_cpus ($gmane/280299) * Improved param documentation ($gmane/280299) Documentation/config.txt | 7 + Documentation/git-grep.txt | 15 ++ builtin/grep.c | 50 +++--- contrib/completion/git-completion.bash | 1 + 4 files changed, 57 insertions(+), 16 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 391a0c3..5084e36 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1447,6 +1447,13 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suit hardware and circumstances. + gpg.program:: Use this custom program instead of "gpg" found on $PATH when making or verifying a PGP signature. The program must support the diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a44d6d..8222a83 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -23,6 +23,7 @@ SYNOPSIS [--break] [--heading] [-p | --show-function] [-A ] [-B ] [-C ] [-W | --function-context] + [--threads ] [-f ] [-e] [--and|--or|--not|(|)|-e ...] [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | ...] @@ -53,6 +54,13 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suit hardware and circumstances. + grep.fullName:: If set to true, enable '--full-name' option by default. @@ -227,6 +235,13 @@ OPTIONS effectively showing the whole function in which the match was found. +--threads :: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suit hardware and circumstances. + -f :: Read patterns from , one per line. diff --git a/builtin/grep.c b/builtin/grep.c index d04f440..f0e3dfb 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -24,11 +24,11 @@ static char const * const grep_usage[] = { NULL }; -static int use_threads = 1; +#define GREP_NUM_THREADS_DEFAULT 8 +static int num_threads = 0; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread_t threads[THREADS]; +static pthread_t *threads; /* We use one producer thread and THREADS consumer * threads. The producer adds struct work_items to 'todo' and the @@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex; static inline void grep_lock(void) { - if (use_threads) + if (num_threads) pthread_mutex_lock(&grep_mutex); } static inline void grep_unlock(void) { - if (use_threads) + if (num_threads) pthread_mutex_unlock(&grep_mutex); } @@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt) strbuf_init(&todo[i].out, 0); } - for (i = 0; i < ARRAY_SIZE(threads); i++) { + threads = xcalloc(num_threads, sizeof(pthread_t)); + for (i = 0; i < num_threads; i++) { int err; struct grep_opt
[PATCH v5] Add git-grep threads param
"git grep" can now be configured (or told from the command line) how many threads to use when searching in the working tree files. Signed-off-by: Victor Leschuk --- Documentation/config.txt | 7 + Documentation/git-grep.txt | 15 ++ builtin/grep.c | 50 +++--- contrib/completion/git-completion.bash | 1 + 4 files changed, 57 insertions(+), 16 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 391a0c3..467fa7b 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1447,6 +1447,13 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suite hardware and circumstances. + gpg.program:: Use this custom program instead of "gpg" found on $PATH when making or verifying a PGP signature. The program must support the diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a44d6d..91027b6 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -23,6 +23,7 @@ SYNOPSIS [--break] [--heading] [-p | --show-function] [-A ] [-B ] [-C ] [-W | --function-context] + [--threads ] [-f ] [-e] [--and|--or|--not|(|)|-e ...] [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | ...] @@ -53,6 +54,13 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suite hardware and circumstances. + grep.fullName:: If set to true, enable '--full-name' option by default. @@ -227,6 +235,13 @@ OPTIONS effectively showing the whole function in which the match was found. +--threads :: + Number of grep worker threads, use it to tune up performance on + your machines. Leave it unset (or set to 0) for default behavior, + which for now is using 8 threads for all systems. + Default behavior can be changed in future versions + to better suite hardware and circumstances. + -f :: Read patterns from , one per line. diff --git a/builtin/grep.c b/builtin/grep.c index d04f440..f0e3dfb 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -24,11 +24,11 @@ static char const * const grep_usage[] = { NULL }; -static int use_threads = 1; +#define GREP_NUM_THREADS_DEFAULT 8 +static int num_threads = 0; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread_t threads[THREADS]; +static pthread_t *threads; /* We use one producer thread and THREADS consumer * threads. The producer adds struct work_items to 'todo' and the @@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex; static inline void grep_lock(void) { - if (use_threads) + if (num_threads) pthread_mutex_lock(&grep_mutex); } static inline void grep_unlock(void) { - if (use_threads) + if (num_threads) pthread_mutex_unlock(&grep_mutex); } @@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt) strbuf_init(&todo[i].out, 0); } - for (i = 0; i < ARRAY_SIZE(threads); i++) { + threads = xcalloc(num_threads, sizeof(pthread_t)); + for (i = 0; i < num_threads; i++) { int err; struct grep_opt *o = grep_opt_dup(opt); o->output = strbuf_out; @@ -238,12 +239,14 @@ static int wait_all(void) pthread_cond_broadcast(&cond_add); grep_unlock(); - for (i = 0; i < ARRAY_SIZE(threads); i++) { + for (i = 0; i < num_threads; i++) { void *h; pthread_join(threads[i], &h); hit |= (int) (intptr_t) h; } + free(threads); + pthread_mutex_destroy(&grep_mutex); pthread_mutex_destroy(&grep_read_mutex); pthread_mutex_destroy(&grep_attr_mutex); @@ -262,10 +265,19 @@ static int wait_all(void) } #endif +static int grep_threads_config(const char *var, const char *value, void *cb) +{ + if (!strcmp(var, "grep.threads")) + num_threads = git_config_int(var, value); /* Sanity check of value wil
RE: [PATCH v4] Add git-grep threads param
Correct. I think adding the option (both to command line and to config file) is good, as long as the IO issues are documented. And default to just the fixed number of threads for now - and with the option, maybe people can then more easily try out different scenarios and maybe improve on the particular choice of fixed number of threads. > Or maybe the default should be something that isn't even described by any > particular fixed number. > For example, maybe the default that value could be something quite dynamic: > start off with a single thread (or a very low thread number) and just set a > timer. After 0.1s, if CPU usage is low, start more threads. After another > 0.1s, if that improved things, maybe > we could add still more threads... > Note that "CPU usage is low" can be hard to get portably, but we could > approximate it with "how much work did we actually get done". If we only > grepped a couple of files, that might be because of IO issues. And if speed > does not improve when we move from a > single thread to, say, four threads, then we should probably *not* increase > the thread number again at 0.2s. > So I think there are many possible avenues to explore that might be > interesting. I do *not* think that "online_cpus()" is one of them, > except > perhaps as a very rough measure of "is this a beefy system or not" (but even > that is questionable - 32 CPUs is definitely > likely "very beefy, so use lots of threads", but even 8 CPUs might still be > just a phone, and I'm not sure that tells you a lot, really. Linus Here is my version of note for Documentaion: Number of grep worker threads, use it to tune up performance on your machines. Leave it unset or set to "0" if you want to use default number (currently default number is 8 for all systems, however this behavior can be changed in future versions to better suite your hardware and circumstances). -- Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
. > In the meantime I'd argue for just getting rid of the online_cpu's > check, because > (a) I think it's actively misleading > (b) the threaded grep probably doesn't hurt much even on a single > CPU, and the _potential_ upside from IO could easily dwarf the cost. > (c) do developers actually have single-core machines any more? Linus So as far as I understood your point at current moment would be better to leave online_cpus() check behind, keep the default threads value (and do so for other threaded programs, in separate patches of course). After that we may focus (in future) on developing smarter heuristics for parallelity-relationed params. Also we should specify in documentation that number of your CPUs may not the optimal value and customer should find his own best values based on other circumstances. Correct? -- Victor-- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
On Mon, Nov 9, 2015 at 9:28 AM, Victor Leschuk wrote: > > Maybe use the simplest version (and keep num_numbers == 0 also as flag for > all other checks in code like if(num_flags) ): > > if (list.nr || cached ) > num_threads = 0; // do not use threads > else if (num_threads == 0) > num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT; > I will say this AGAIN. > The number of threads is *not* about the number of CPU's. Stop this craziness. It's wrong. Actually I have never said the nCPUs played main role in it. The patch is intended to provide user ability to change this threads number according to their needs and to touch as small amount of other code as possible. > The number of threads is about parallelism. Yes, CPU's is a small part > of it. But as mentioned earlier, the *big* wins are for slow > filesystems, NFS in particular. On NFS, even if you have things > cached, the cache revalidation is going to cause network traffic > almost every time. Being able to have several of those outstanding is > a big deal. > So stop with the "online_cpus()" stuff. And don't base your benchmarks > purely on the CPU-bound case. Because the CPU-bound case is the case > that is already generally so good that few people will care all *that* > deeply. I have performed a cold-cached FS test (in previous thread to minimize the CPU part in the results) and it showed high correlation between speed and thread_num. Isn't it what you said? Even on systems with small number of cores we can gain profit of multithreading. That's I why I suggest this param to be customizable and not HARDCODED. We need to create a clear text for the documentation that this number should not based on number of CPU-s only. Currently do not mention anything on it. -- Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
> if (list.nr || cached) >num_threads = 1; > if (!num_threads) > num_threads = GREP_NUM_THREADS_DEFAULT; > and then later, instead of use_threads, do: > if (num_threads <= 1) { ... do single-threaded version ... > } else { ... do multi-threaded version ... > } > That matches the logic in builtin/pack-objects.c. Maybe use the simplest version (and keep num_numbers == 0 also as flag for all other checks in code like if(num_flags) ): if (list.nr || cached ) num_threads = 0; // do not use threads else if (num_threads == 0) num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT; else if (num_threads < 0) die(...) // here we are num_threads > zero, so do nothing -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
> Why don't we leave it at 8, then? That's the conservative choice, and > once we have --threads, people can easily experiment with different > values and we can follow-up with a change to the default if need be. I'd propose the following: if (list.nr || cached) { num_threads = 0; /* Can not multi-thread object lookup */ } else if (num_threads < 0 && online_cpus() <= 1) { num_threads = 0; /* User didn't set threading option and we have <= 1 of hardware cores */ } else if (num_threads == 0) { num_threads = GREP_NUM_THREADS_DEFAULT; /* User explicitly choose default behavior */ } else if (num_threads < 0) { /* Actually this one should be checked earlier so no need to double check here */ die(_("Ivalid number of threads specified (%d)"), num_threads) } -- Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
From: Jeff King [p...@peff.net] Sent: Tuesday, November 03, 2015 22:40 To: Junio C Hamano Cc: Victor Leschuk; git@vger.kernel.org; Victor Leschuk; torva...@linux-foundation.org; j...@keeping.me.uk Subject: Re: [PATCH v4] Add git-grep threads param On Tue, Nov 03, 2015 at 09:22:09AM -0800, Junio C Hamano wrote: > > +grep.threads:: > > + Number of grep worker threads, use it to tune up performance on > > + multicore machines. Default value is 8. Set to 0 to disable threading. > > + > > I am not enthused by this "Set to 0 to disable". As Zero is > magical, it would be more useful if 1 meant that threading is not > used (i.e. there is only 1 worker), and 0 meant that we would > automatically pick some reasonable parallelism for you (and we > promise that the our choice would not be outrageously wrong), or > something like that. > Not just useful, but consistent with other parts of git, like > pack.threads, where "0" already means "autodetect". Hello Peff and Junio, Yeah do also think it would be more reasonable to use "0" for "autodetect" default value. However chat this autodetect value should be? For index index-pack and pack-objects we use ncpus() for this, however according to my tests this wouldn't an Ideal for all cases. Maybe it should be something like ncpus()*2, anyway before it we even used hard-coded 8 for all systems... In this case we use "1" as "Do not use threads" and die on all negative numbers during parsing. Agreed? -- Victor-- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
>> do we have any objections on this patch? > The question you should be asking is "do we have any support". Hello all, CCing participated reviewers. As Junio has correctly mentioned: "Do we have any support for including this functionality?" I think this kind of customization can be useful in tuning up performance as confirmed by conducted tests. And in most cases having anything hard-coded is worse than giving users opportunity to change it, isn't it? -- Best Regards, Victor-- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
Hello all, do we have any objections on this patch? -- Best Regards, Victor From: Victor Leschuk [vlesc...@gmail.com] Sent: Tuesday, October 27, 2015 14:22 To: git@vger.kernel.org Cc: Victor Leschuk Subject: [PATCH v4] Add git-grep threads param Make number of git-grep worker threads a configuration parameter. According to several tests on systems with different number of CPU cores the hard-coded number of 8 threads is not optimal for all systems: tuning this parameter can significantly speed up grep performance. Signed-off-by: Victor Leschuk --- Documentation/config.txt | 4 +++ Documentation/git-grep.txt | 9 ++ builtin/grep.c | 56 -- contrib/completion/git-completion.bash | 1 + 4 files changed, 54 insertions(+), 16 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 391a0c3..1dd2a61 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1447,6 +1447,10 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + multicore machines. Default value is 8. Set to 0 to disable threading. + gpg.program:: Use this custom program instead of "gpg" found on $PATH when making or verifying a PGP signature. The program must support the diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a44d6d..e766596 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -23,6 +23,7 @@ SYNOPSIS [--break] [--heading] [-p | --show-function] [-A ] [-B ] [-C ] [-W | --function-context] + [--threads ] [-f ] [-e] [--and|--or|--not|(|)|-e ...] [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | ...] @@ -53,6 +54,10 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + multicore machines. Default value is 8. Set to 0 to disable threading. + grep.fullName:: If set to true, enable '--full-name' option by default. @@ -227,6 +232,10 @@ OPTIONS effectively showing the whole function in which the match was found. +--threads :: + Set number of worker threads to . Default is 8. + Set to 0 to disable threading. + -f :: Read patterns from , one per line. diff --git a/builtin/grep.c b/builtin/grep.c index d04f440..694553e 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -24,11 +24,11 @@ static char const * const grep_usage[] = { NULL }; -static int use_threads = 1; +#define GREP_NUM_THREADS_DEFAULT 8 +static int num_threads = -1; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread_t threads[THREADS]; +static pthread_t *threads; /* We use one producer thread and THREADS consumer * threads. The producer adds struct work_items to 'todo' and the @@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex; static inline void grep_lock(void) { - if (use_threads) + if (num_threads) pthread_mutex_lock(&grep_mutex); } static inline void grep_unlock(void) { - if (use_threads) + if (num_threads) pthread_mutex_unlock(&grep_mutex); } @@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt) strbuf_init(&todo[i].out, 0); } - for (i = 0; i < ARRAY_SIZE(threads); i++) { + threads = xcalloc(num_threads, sizeof(pthread_t)); + for (i = 0; i < num_threads; i++) { int err; struct grep_opt *o = grep_opt_dup(opt); o->output = strbuf_out; @@ -238,12 +239,14 @@ static int wait_all(void) pthread_cond_broadcast(&cond_add); grep_unlock(); - for (i = 0; i < ARRAY_SIZE(threads); i++) { + for (i = 0; i < num_threads; i++) { void *h; pthread_join(threads[i], &h); hit |= (int) (intptr_t) h; } + free(threads); + pthread_mutex_destroy(&grep_mutex); pthread_mutex_destroy(&grep_read_mutex); pthread_mutex_destroy(&grep_attr_mutex); @@ -262,10 +265,22 @@ static int wait_all(void) } #endif +static int grep_threads_config(const char *var, const char *value, void *cb) +{ + if (!strcmp(var, "grep.threads")) { + num_threads = git_config_int(var, value); + if (num_threads < 0) + die("Invalid number of threads specified (%d)", num_threads); + } + return 0
RE: [PATCH 3/6] Facilitate debugging Git executables in tests with gdb
> > +if test -n "$TEST_GDB_GIT" > > +then > > + exec gdb -args "${GIT_EXEC_PATH}/@@PROG@@" "$@" > Maybe we could make $TEST_GDB_GIT not just a boolean flag? It would be useful > to contain "gdb" executable name. It would allow to set path to GDB when it > is not in $PATH, set different debuggers (for example, I usually use cgdb), > or even set it to /path/to/gdb_wrapper.sh which could contain different gdb > options and tunings. > Sure, as long as TEST_GDB_GIT=1 still works. Why don't you make an add-on > patch and submit it? Hello Johannes, Sure, I will prepare the patch as soon as this one is included in master. -- Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6] Facilitate debugging Git executables in tests with gdb
+if test -n "$TEST_GDB_GIT" +then + exec gdb -args "${GIT_EXEC_PATH}/@@PROG@@" "$@" Maybe we could make $TEST_GDB_GIT not just a boolean flag? It would be useful to contain "gdb" executable name. It would allow to set path to GDB when it is not in $PATH, set different debuggers (for example, I usually use cgdb), or even set it to /path/to/gdb_wrapper.sh which could contain different gdb options and tunings. -- Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug: Segfault when doing "git diff"
On 10/28/2015 04:35 PM, Mathias L. Baumann wrote: I was using the latest git version 2.6.2 already. I suspect it is due to a .gitconfig. This is what is probably required: ➜ ~ cat .gitconfig [diff] submodule = log Yep, that did the trick. The segfault is from sha1_file.c: /* add the alternate entry */ *alt_odb_tail = ent; /* <= alt_obd_tail is NULL here */ alt_odb_tail = &(ent->next); ent->next = NULL; Will try to take a closer look at it. -- Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug: Segfault when doing "git diff"
On 10/28/2015 02:58 PM, Mathias L. Baumann wrote: Hello dear git devs, I just stumbled upon a segfault when doing just "git diff" in my repo. I managed to create a minimal repo setup where the bug is reproducable. The problem seems to be a mix of having an untracked submodule and having set an alternates file for one submodule. Attached you'll find my setup that will reproduce the problem. Simply run 'git diff' in bugtest1. In case the attachment is refused, I also uploaded it here: http://supraverse.net/bugdemo.tar.gz cheers, --Marenz Hello Marenz, I have just tried to reproduce segfault with the provided archive: [del@del-debian bugtest1 (master)]$ git diff diff --git a/submodules/bugtest2 b/submodules/bugtest2 --- a/submodules/bugtest2 +++ b/submodules/bugtest2 @@ -1 +1 @@ -Subproject commit cd0b9ee2946d2df3626943347332a4d86f93b126 +Subproject commit cd0b9ee2946d2df3626943347332a4d86f93b126-dirty No segfault occured. I am using git version 2.6.2.308.g3b8f10c Could you please specify which version of git you are using and also try to reproduce it with latest 2.6.2? -- Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] Add git-grep threads param
Make number of git-grep worker threads a configuration parameter. According to several tests on systems with different number of CPU cores the hard-coded number of 8 threads is not optimal for all systems: tuning this parameter can significantly speed up grep performance. Signed-off-by: Victor Leschuk --- Documentation/config.txt | 4 +++ Documentation/git-grep.txt | 9 ++ builtin/grep.c | 56 -- contrib/completion/git-completion.bash | 1 + 4 files changed, 54 insertions(+), 16 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 391a0c3..1dd2a61 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1447,6 +1447,10 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + multicore machines. Default value is 8. Set to 0 to disable threading. + gpg.program:: Use this custom program instead of "gpg" found on $PATH when making or verifying a PGP signature. The program must support the diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a44d6d..e766596 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -23,6 +23,7 @@ SYNOPSIS [--break] [--heading] [-p | --show-function] [-A ] [-B ] [-C ] [-W | --function-context] + [--threads ] [-f ] [-e] [--and|--or|--not|(|)|-e ...] [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | ...] @@ -53,6 +54,10 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + multicore machines. Default value is 8. Set to 0 to disable threading. + grep.fullName:: If set to true, enable '--full-name' option by default. @@ -227,6 +232,10 @@ OPTIONS effectively showing the whole function in which the match was found. +--threads :: + Set number of worker threads to . Default is 8. + Set to 0 to disable threading. + -f :: Read patterns from , one per line. diff --git a/builtin/grep.c b/builtin/grep.c index d04f440..694553e 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -24,11 +24,11 @@ static char const * const grep_usage[] = { NULL }; -static int use_threads = 1; +#define GREP_NUM_THREADS_DEFAULT 8 +static int num_threads = -1; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread_t threads[THREADS]; +static pthread_t *threads; /* We use one producer thread and THREADS consumer * threads. The producer adds struct work_items to 'todo' and the @@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex; static inline void grep_lock(void) { - if (use_threads) + if (num_threads) pthread_mutex_lock(&grep_mutex); } static inline void grep_unlock(void) { - if (use_threads) + if (num_threads) pthread_mutex_unlock(&grep_mutex); } @@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt) strbuf_init(&todo[i].out, 0); } - for (i = 0; i < ARRAY_SIZE(threads); i++) { + threads = xcalloc(num_threads, sizeof(pthread_t)); + for (i = 0; i < num_threads; i++) { int err; struct grep_opt *o = grep_opt_dup(opt); o->output = strbuf_out; @@ -238,12 +239,14 @@ static int wait_all(void) pthread_cond_broadcast(&cond_add); grep_unlock(); - for (i = 0; i < ARRAY_SIZE(threads); i++) { + for (i = 0; i < num_threads; i++) { void *h; pthread_join(threads[i], &h); hit |= (int) (intptr_t) h; } + free(threads); + pthread_mutex_destroy(&grep_mutex); pthread_mutex_destroy(&grep_read_mutex); pthread_mutex_destroy(&grep_attr_mutex); @@ -262,10 +265,22 @@ static int wait_all(void) } #endif +static int grep_threads_config(const char *var, const char *value, void *cb) +{ + if (!strcmp(var, "grep.threads")) { + num_threads = git_config_int(var, value); + if (num_threads < 0) + die("Invalid number of threads specified (%d)", num_threads); + } + return 0; +} + static int grep_cmd_config(const char *var, const char *value, void *cb) { int st = grep_config(var, value, cb); - if (git_color_default_config(var, value, cb) < 0) + if (grep_threads_config(var, value, cb) < 0) + st = -1; + els
RE: [PATCH v3] Add git-grep threads param
Hello John, >> This thought also crossed my mind, however we already pass grep_opt to >> start_threads() function, so I think passing it to wait_all() is not >> that ugly, and kind of symmetric. And I do not like the idea of >> duplicating same information in different places. What do you think? > The grep_opt in start_threads() is being passed through to run(), so it > seems slightly different to me. If the threads were being setup in > grep.c (as opposed to builtin/grep.c) then I'd agree that it belongs in > grep_opt, but since this is local to this particular user of the grep > infrastructure adding num_threads to the grep_opt structure at all feels > wrong to me. > Note that I wasn't suggesting passing num_threads as a parameter to > wait_all(), but rather having it as global state that is accessed by > wait_all() in the same way as the `threads` array. > If we rename use_threads to num_threads and just use that, then we only > have the information in one place don't we? Yeah, I understood your idea. So we parse config_value directly to static int num_threads; /* old use_threads */ And use it internally in builtin/grep.c. I think you are right. Looks like grep_cmd_config() is the right place to parse it. Something like: --- a/builtin/grep.c +++ b/builtin/grep.c @@ -267,6 +267,8 @@ static int wait_all(struct grep_opt *opt) static int grep_cmd_config(const char *var, const char *value, void *cb) { int st = grep_config(var, value, cb); + if (thread_config(var, value, cb) < 0) + st = -1; if (git_color_default_config(var, value, cb) < 0) st = -1; return st; What do you think? -- Best Regards, Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3] Add git-grep threads param
Hello Linus, >> According to several tests on systems with different number of CPU cores >> the hard-coded number of 8 threads is not optimal for all systems: > Did you also compare cold-cache filesystem performance? > One of the reasons for doing threaded grep is for CPU scaling. But another > is for IO scaling. If your git tree is over NFS, doing grep eight threads at > a time if likely going to make things much faster even if you are on a single > CPU. Yes, I have performed tests on cold-cache FS and it looks like number of threads affects performance. Here are the results for grepping linux kernel repo on a 4-core machine (similar test was conducted on 8-core machine): Threads: 4 Time: 39.13 Threads: 8 Time: 34.39 Threads: 16 Time: 31.46 Threads: 32 Time: 27.40 Here is test scenario: #!/bin/bash TIMEFORMAT=%R GIT=/home/del/git-dev/bin/git TESTS=10 for n in 4 8 16 32; do echo -n "Threads: $n Time: " for i in $(seq 1 $TESTS); do echo 3 > /proc/sys/vm/drop_caches time $GIT grep --threads $n -e '#define' --and \( -e MAX_PATH -e PATH_MAX \) >/dev/null done 2>&1 | awk -v ntests=${TESTS} '{sum+=$1} END{printf "%.2f\n", sum/ntests}' done Note: With hot-cache grepping with 4 threads gives fastest results on both 4-core and 8-core machines. Thus I think it can be useful for users to be able to tune the threads number according to their needs. -- Victor-- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3] Add git-grep threads param
Hello John, see comments inline. >> @@ -22,6 +22,7 @@ SYNOPSIS >> [--color[=] | --no-color] >> [--break] [--heading] [-p | --show-function] >> [-A ] [-B ] [-C ] >> +[--threads ] > Is this the best place for this option? I know the current list isn't > sorted in any particular way, but here you're splitting up the set of > context options (`-A`, `-B`, `-C` and `-W`). Agree, I'll move the option both here and in documentation. >> -static int wait_all(void) >> +static int wait_all(struct grep_opt *opt) > I'm not sure passing a grep_opt in here is the cleanest way to do this. > Options are a UI concept and all we care about here is the number of > threads. > Since `threads` is a global, shouldn't the number of threads be a global > as well? Could we reuse `use_threads` here (possibly renaming it > `num_threads`)? This thought also crossed my mind, however we already pass grep_opt to start_threads() function, so I think passing it to wait_all() is not that ugly, and kind of symmetric. And I do not like the idea of duplicating same information in different places. What do you think? -- Best Regards, Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] Add git-grep threads param
Make number of git-grep worker threads a configuration parameter. According to several tests on systems with different number of CPU cores the hard-coded number of 8 threads is not optimal for all systems: tuning this parameter can significantly speed up grep performance. Signed-off-by: Victor Leschuk --- Documentation/config.txt | 4 Documentation/git-grep.txt | 4 builtin/grep.c | 34 ++ contrib/completion/git-completion.bash | 1 + grep.c | 10 ++ grep.h | 2 ++ 6 files changed, 47 insertions(+), 8 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 391a0c3..1c95587 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1447,6 +1447,10 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + multicore machines. Default value is 8. + gpg.program:: Use this custom program instead of "gpg" found on $PATH when making or verifying a PGP signature. The program must support the diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a44d6d..fbd4f83 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -22,6 +22,7 @@ SYNOPSIS [--color[=] | --no-color] [--break] [--heading] [-p | --show-function] [-A ] [-B ] [-C ] + [--threads ] [-W | --function-context] [-f ] [-e] [--and|--or|--not|(|)|-e ...] @@ -220,6 +221,9 @@ OPTIONS Show leading lines, and place a line containing `--` between contiguous groups of matches. +--threads :: + Set number of worker threads to . Default is 8. + -W:: --function-context:: Show the surrounding text from the previous line containing a diff --git a/builtin/grep.c b/builtin/grep.c index d04f440..5ef1b07 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -27,8 +27,7 @@ static char const * const grep_usage[] = { static int use_threads = 1; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread_t threads[THREADS]; +static pthread_t *threads; /* We use one producer thread and THREADS consumer * threads. The producer adds struct work_items to 'todo' and the @@ -206,7 +205,8 @@ static void start_threads(struct grep_opt *opt) strbuf_init(&todo[i].out, 0); } - for (i = 0; i < ARRAY_SIZE(threads); i++) { + threads = xcalloc(opt->num_threads, sizeof(pthread_t)); + for (i = 0; i < opt->num_threads; i++) { int err; struct grep_opt *o = grep_opt_dup(opt); o->output = strbuf_out; @@ -220,7 +220,7 @@ static void start_threads(struct grep_opt *opt) } } -static int wait_all(void) +static int wait_all(struct grep_opt *opt) { int hit = 0; int i; @@ -238,12 +238,14 @@ static int wait_all(void) pthread_cond_broadcast(&cond_add); grep_unlock(); - for (i = 0; i < ARRAY_SIZE(threads); i++) { + for (i = 0; i < opt->num_threads; i++) { void *h; pthread_join(threads[i], &h); hit |= (int) (intptr_t) h; } + free(threads); + pthread_mutex_destroy(&grep_mutex); pthread_mutex_destroy(&grep_read_mutex); pthread_mutex_destroy(&grep_attr_mutex); @@ -256,7 +258,7 @@ static int wait_all(void) } #else /* !NO_PTHREADS */ -static int wait_all(void) +static int wait_all(struct grep_opt *opt) { return 0; } @@ -702,6 +704,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) N_("show context lines before matches")), OPT_INTEGER('A', "after-context", &opt.post_context, N_("show context lines after matches")), + OPT_INTEGER(0, "threads", &opt.num_threads, + N_("use worker threads")), OPT_NUMBER_CALLBACK(&opt, N_("shortcut for -C NUM"), context_callback), OPT_BOOL('p', "show-function", &opt.funcname, @@ -832,8 +836,22 @@ int cmd_grep(int argc, const char **argv, const char *prefix) } #ifndef NO_PTHREADS - if (list.nr || cached || online_cpus() == 1) + if (!opt.num_threads) { + use_threads = 0; /* User explicitely told not to use threads */ + } + else if (list.nr || cached) { + use_threads = 0; /* Can not multi-thread object lookup */ + } + else if (opt.num_threads &
Re: git --literal-pathspecs add -u says "fatal: pathspec ':/' did not match any files"
Hello Noam, The problem is that in the absence of explicit argument we set the list of files to special path ":/" which means repo root: if ((0 < addremove_explicit || take_worktree_changes) && !argc) { static const char *whole[2] = { ":/", NULL }; argc = 1; argv = whole; } And after that we treat it as regular file if (!seen[i] && path[0] && ((pathspec.items[i].magic & (PATHSPEC_GLOB | PATHSPEC_ICASE)) || !file_exists(path))) { /* < file_exists() here just checks lstat() result */ Maybe it'll make sense to modify file_exists() to treat ":/" specially. Something like that: diff --git a/dir.c b/dir.c index 109ceea..6cae3b9 100644 --- a/dir.c +++ b/dir.c @@ -2103,6 +2103,10 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const stru int file_exists(const char *f) { struct stat sb; + if (!strcmp(f, ":/")) { + /* ":/" - root dir, always exists */ + return 1; + } return lstat(f, &sb) == 0; } -- Victor On 10/24/2015 02:39 AM, Noam Postavsky wrote: ~/tmp/tmprepo$ git init Initialized empty Git repository in /home/npostavs/tmp/tmprepo/.git/ ~/tmp/tmprepo$ git --literal-pathspecs add -u fatal: pathspec ':/' did not match any files ~/tmp/tmprepo$ git --version git version 2.6.1 It was reported[1] that 2.0.2 and several following versions also fail with the same error; I found that version 1.9.5 succeeds. Adding a "." argument: git --literal-pathspecs add -u . succeeds in all versions. [1]: https://github.com/magit/magit/issues/2354#issuecomment-150665961 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Add git-grep threads-num param
It's a follow up to "[PATCH] Add git-grep threads-num param": Make number of git-grep worker threads a configuration parameter. I have run several tests on systems with different number of CPU cores. It appeared that the hard-coded number 8 lowers performance on both of my systems: on my 4-core and 8-core systems the thread number of 4 worked about 20% faster than default 8. So I think it is better to allow users tune this parameter. Signed-off-by: Victor Leschuk --- Documentation/config.txt | 4 Documentation/git-grep.txt | 4 builtin/grep.c | 20 contrib/completion/git-completion.bash | 1 + grep.c | 11 +++ grep.h | 2 ++ 6 files changed, 34 insertions(+), 8 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 391a0c3..1c95587 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1447,6 +1447,10 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + multicore machines. Default value is 8. + gpg.program:: Use this custom program instead of "gpg" found on $PATH when making or verifying a PGP signature. The program must support the diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a44d6d..fbd4f83 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -22,6 +22,7 @@ SYNOPSIS [--color[=] | --no-color] [--break] [--heading] [-p | --show-function] [-A ] [-B ] [-C ] + [--threads ] [-W | --function-context] [-f ] [-e] [--and|--or|--not|(|)|-e ...] @@ -220,6 +221,9 @@ OPTIONS Show leading lines, and place a line containing `--` between contiguous groups of matches. +--threads :: + Set number of worker threads to . Default is 8. + -W:: --function-context:: Show the surrounding text from the previous line containing a diff --git a/builtin/grep.c b/builtin/grep.c index d04f440..3950725 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -27,8 +27,7 @@ static char const * const grep_usage[] = { static int use_threads = 1; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread_t threads[THREADS]; +static pthread_t *threads; /* We use one producer thread and THREADS consumer * threads. The producer adds struct work_items to 'todo' and the @@ -206,7 +205,8 @@ static void start_threads(struct grep_opt *opt) strbuf_init(&todo[i].out, 0); } - for (i = 0; i < ARRAY_SIZE(threads); i++) { + threads = xcalloc(opt->num_threads, sizeof(pthread_t)); + for (i = 0; i < opt->num_threads; i++) { int err; struct grep_opt *o = grep_opt_dup(opt); o->output = strbuf_out; @@ -220,7 +220,7 @@ static void start_threads(struct grep_opt *opt) } } -static int wait_all(void) +static int wait_all(struct grep_opt *opt) { int hit = 0; int i; @@ -238,12 +238,14 @@ static int wait_all(void) pthread_cond_broadcast(&cond_add); grep_unlock(); - for (i = 0; i < ARRAY_SIZE(threads); i++) { + for (i = 0; i < opt->num_threads; i++) { void *h; pthread_join(threads[i], &h); hit |= (int) (intptr_t) h; } + free(threads); + pthread_mutex_destroy(&grep_mutex); pthread_mutex_destroy(&grep_read_mutex); pthread_mutex_destroy(&grep_attr_mutex); @@ -256,7 +258,7 @@ static int wait_all(void) } #else /* !NO_PTHREADS */ -static int wait_all(void) +static int wait_all(struct grep_opt *opt) { return 0; } @@ -702,6 +704,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) N_("show context lines before matches")), OPT_INTEGER('A', "after-context", &opt.post_context, N_("show context lines after matches")), + OPT_INTEGER(0, "threads", &opt.num_threads, + N_("use worker threads")), OPT_NUMBER_CALLBACK(&opt, N_("shortcut for -C NUM"), context_callback), OPT_BOOL('p', "show-function", &opt.funcname, @@ -832,7 +836,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) } #ifndef NO_PTHREADS - if (list.nr || cached || online_cpus() == 1) + if (list.nr || cached || online_cpus() == 1 || opt.num_threads <= 1) use_threads = 0; #else
RE: [PATCH] Add git-grep threads-num param
Hello John, I haven't noticed the --threads option in pack-objects, I will fix the patch to make naming more uniform. Do you have any comments regarding the functionality? -- Best Regards, Victor From: John Keeping [j...@keeping.me.uk] Sent: Thursday, October 22, 2015 7:23 AM To: Victor Leschuk Cc: git@vger.kernel.org; Victor Leschuk Subject: Re: [PATCH] Add git-grep threads-num param On Thu, Oct 22, 2015 at 04:23:56PM +0300, Victor Leschuk wrote: > Hello all, I suggest we make number of git-grep worker threads a configuration > parameter. I have run several tests on systems with different number of CPU > cores. > It appeared that the hard-coded number 8 lowers performance on both of my > systems: > on my 4-core and 8-core systems the thread number of 4 worked about 20% > faster than > default 8. So I think it is better to allow users tune this parameter. For git-pack-objects we call the command-line parameter "--threads" and the config variable "pack.threads". Is there a reason not to use the same name here (i.e. "--threads" and "grep.threads")? > Signed-off-by: Victor Leschuk > --- > Documentation/config.txt | 4 > Documentation/git-grep.txt | 5 + > builtin/grep.c | 20 +--- > contrib/completion/git-completion.bash | 1 + > grep.c | 15 +++ > grep.h | 4 > 6 files changed, 42 insertions(+), 7 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 391a0c3..c3df20c 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -1447,6 +1447,10 @@ grep.extendedRegexp:: > option is ignored when the 'grep.patternType' option is set to a value > other than 'default'. > > +grep.threadsNum:: > + Number of grep worker threads, use it to tune up performance on > + multicore machines. Default value is 8. > + > gpg.program:: > Use this custom program instead of "gpg" found on $PATH when > making or verifying a PGP signature. The program must support the > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index 4a44d6d..e9ca265 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -22,6 +22,7 @@ SYNOPSIS > [--color[=] | --no-color] > [--break] [--heading] [-p | --show-function] > [-A ] [-B ] [-C ] > +[-t ] > [-W | --function-context] > [-f ] [-e] > [--and|--or|--not|(|)|-e ...] > @@ -220,6 +221,10 @@ OPTIONS > Show leading lines, and place a line containing > `--` between contiguous groups of matches. > > +-t :: > +--threads-num :: > + Set number of worker threads to . Default is 8. > + > -W:: > --function-context:: > Show the surrounding text from the previous line containing a > diff --git a/builtin/grep.c b/builtin/grep.c > index d04f440..9b4fc47 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -27,8 +27,7 @@ static char const * const grep_usage[] = { > static int use_threads = 1; > > #ifndef NO_PTHREADS > -#define THREADS 8 > -static pthread_t threads[THREADS]; > +static pthread_t *threads; > > /* We use one producer thread and THREADS consumer > * threads. The producer adds struct work_items to 'todo' and the > @@ -206,7 +205,8 @@ static void start_threads(struct grep_opt *opt) > strbuf_init(&todo[i].out, 0); > } > > - for (i = 0; i < ARRAY_SIZE(threads); i++) { > + threads = xmalloc(sizeof(pthread_t) * opt->num_threads); > + for (i = 0; i < opt->num_threads; i++) { > int err; > struct grep_opt *o = grep_opt_dup(opt); > o->output = strbuf_out; > @@ -220,7 +220,7 @@ static void start_threads(struct grep_opt *opt) > } > } > > -static int wait_all(void) > +static int wait_all(struct grep_opt *opt) > { > int hit = 0; > int i; > @@ -238,12 +238,14 @@ static int wait_all(void) > pthread_cond_broadcast(&cond_add); > grep_unlock(); > > - for (i = 0; i < ARRAY_SIZE(threads); i++) { > + for (i = 0; i < opt->num_threads; i++) { > void *h; > pthread_join(threads[i], &h); > hit |= (int) (intptr_t) h; > } > > + free(threads); > + > pthread_mutex_destroy(&grep_mutex); > pthread_mutex_destroy(&grep_read_mutex); > pthread_mutex_destroy(&grep_attr_mutex); > @@ -256,7
[PATCH] Add git-grep threads-num param
Hello all, I suggest we make number of git-grep worker threads a configuration parameter. I have run several tests on systems with different number of CPU cores. It appeared that the hard-coded number 8 lowers performance on both of my systems: on my 4-core and 8-core systems the thread number of 4 worked about 20% faster than default 8. So I think it is better to allow users tune this parameter. Signed-off-by: Victor Leschuk --- Documentation/config.txt | 4 Documentation/git-grep.txt | 5 + builtin/grep.c | 20 +--- contrib/completion/git-completion.bash | 1 + grep.c | 15 +++ grep.h | 4 6 files changed, 42 insertions(+), 7 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 391a0c3..c3df20c 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1447,6 +1447,10 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threadsNum:: + Number of grep worker threads, use it to tune up performance on + multicore machines. Default value is 8. + gpg.program:: Use this custom program instead of "gpg" found on $PATH when making or verifying a PGP signature. The program must support the diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a44d6d..e9ca265 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -22,6 +22,7 @@ SYNOPSIS [--color[=] | --no-color] [--break] [--heading] [-p | --show-function] [-A ] [-B ] [-C ] + [-t ] [-W | --function-context] [-f ] [-e] [--and|--or|--not|(|)|-e ...] @@ -220,6 +221,10 @@ OPTIONS Show leading lines, and place a line containing `--` between contiguous groups of matches. +-t :: +--threads-num :: + Set number of worker threads to . Default is 8. + -W:: --function-context:: Show the surrounding text from the previous line containing a diff --git a/builtin/grep.c b/builtin/grep.c index d04f440..9b4fc47 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -27,8 +27,7 @@ static char const * const grep_usage[] = { static int use_threads = 1; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread_t threads[THREADS]; +static pthread_t *threads; /* We use one producer thread and THREADS consumer * threads. The producer adds struct work_items to 'todo' and the @@ -206,7 +205,8 @@ static void start_threads(struct grep_opt *opt) strbuf_init(&todo[i].out, 0); } - for (i = 0; i < ARRAY_SIZE(threads); i++) { + threads = xmalloc(sizeof(pthread_t) * opt->num_threads); + for (i = 0; i < opt->num_threads; i++) { int err; struct grep_opt *o = grep_opt_dup(opt); o->output = strbuf_out; @@ -220,7 +220,7 @@ static void start_threads(struct grep_opt *opt) } } -static int wait_all(void) +static int wait_all(struct grep_opt *opt) { int hit = 0; int i; @@ -238,12 +238,14 @@ static int wait_all(void) pthread_cond_broadcast(&cond_add); grep_unlock(); - for (i = 0; i < ARRAY_SIZE(threads); i++) { + for (i = 0; i < opt->num_threads; i++) { void *h; pthread_join(threads[i], &h); hit |= (int) (intptr_t) h; } + free(threads); + pthread_mutex_destroy(&grep_mutex); pthread_mutex_destroy(&grep_read_mutex); pthread_mutex_destroy(&grep_attr_mutex); @@ -256,7 +258,7 @@ static int wait_all(void) } #else /* !NO_PTHREADS */ -static int wait_all(void) +static int wait_all(struct grep_opt *opt) { return 0; } @@ -702,6 +704,10 @@ int cmd_grep(int argc, const char **argv, const char *prefix) N_("show context lines before matches")), OPT_INTEGER('A', "after-context", &opt.post_context, N_("show context lines after matches")), +#ifndef NO_PTHREADS + OPT_INTEGER('t', "threads-num", &opt.num_threads, + N_("use worker threads")), +#endif /* !NO_PTHREADS */ OPT_NUMBER_CALLBACK(&opt, N_("shortcut for -C NUM"), context_callback), OPT_BOOL('p', "show-function", &opt.funcname, @@ -910,7 +916,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) } if (use_threads) - hit |= wait_all(); + hit |= wait_all(&opt); if (hit && show_in_pager) run_pager(&
RE: thread-utils: build with NO_PTHREADS fails
Hello Junio, sorry that was my fault, I was building it wrong way (defined NO_PTHREADS in CFLAGS variable, not as separate make variable). Sorry for the false alarm. -- Best Regards, Victor From: Junio C Hamano [jch2...@gmail.com] On Behalf Of Junio C Hamano [gits...@pobox.com] Sent: Monday, October 12, 2015 10:55 AM To: Victor Leschuk Cc: git@vger.kernel.org; vlesc...@gmail.com Subject: Re: thread-utils: build with NO_PTHREADS fails Junio C Hamano writes: > Victor Leschuk writes: > >> I think that no one tried it for a long time but I needed a >> single-threaded git version for debug purpose. I tried to build >> with -DNO_PTHREADS and thread-utils.c failed to compile. >> >> In brief the situation is the following: >> >> in header file we have something like that: >> >> >> #ifndef NO_PTHREAD >> extern int online_cpus(void); >> >> #else >> #define online_cpus() 1 >> #endif // NO_PTHREAD >> >> and in *.c file: >> >> >> int online_cpus(void) >> { >> // ... >> } > > Yeah, that is obviously incorrect. > ... Well, no, I spoke too early. I do not see there is much wrong here. There is this bit in the Makefile: ifdef NO_PTHREADS BASIC_CFLAGS += -DNO_PTHREADS else BASIC_CFLAGS += $(PTHREAD_CFLAGS) EXTLIBS += $(PTHREAD_LIBS) LIB_OBJS += thread-utils.o endif The source file thread-utils.c is not compiled to thread-utils.o if you say NO_PTHREADS, and the resulting libgit.a does not of course have it. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
thread-utils: build with NO_PTHREADS fails
Hello all, I think that no one tried it for a long time but I needed a single-threaded git version for debug purpose. I tried to build with -DNO_PTHREADS and thread-utils.c failed to compile. In brief the situation is the following: in header file we have something like that: #ifndef NO_PTHREAD extern int online_cpus(void); #else #define online_cpus() 1 #endif // NO_PTHREAD and in *.c file: int online_cpus(void) { // ... } So the compilation fails with: test.c:3:21: error: macro "online_cpus" passed 1 arguments, but takes just 0 int online_cpus(void) That's a tiny issue, but maybe we could apply a straight-forward solution (see attached diff)? If you agree I'll prepare a properly-formatted [PATCH] submit. -- Best Regards, Victordiff --git a/thread-utils.c b/thread-utils.c index a2135e0..f3e90fb 100644 --- a/thread-utils.c +++ b/thread-utils.c @@ -20,6 +20,7 @@ int online_cpus(void) { +#ifndef NO_PTHREADS #ifdef _SC_NPROCESSORS_ONLN long ncpus; #endif @@ -58,11 +59,13 @@ int online_cpus(void) return (int)ncpus; #endif +#endif return 1; } int init_recursive_mutex(pthread_mutex_t *m) { +#ifndef NO_PTHREADS pthread_mutexattr_t a; int ret; @@ -74,4 +77,7 @@ int init_recursive_mutex(pthread_mutex_t *m) pthread_mutexattr_destroy(&a); } return ret; +#else + return 0; +#endif } diff --git a/thread-utils.h b/thread-utils.h index d9a769d..6fb98c3 100644 --- a/thread-utils.h +++ b/thread-utils.h @@ -7,9 +7,5 @@ extern int online_cpus(void); extern int init_recursive_mutex(pthread_mutex_t*); -#else - -#define online_cpus() 1 - #endif #endif /* THREAD_COMPAT_H */
RE: [PATCH] git-svn: make batch mode optional for git-cat-file
Hello Eric, Thanks for all the advices. I have played with several repositories (both on 32bit and 64bit machines). You were correct most of the memory if used by mapped files and yes it doesn't cause any problems, even a 32bit machine with 500Mb of memory works normally with a heavy loaded git-cat-file. Thanks also for the advice to use git gc config options, I tested gc.auto=0 and it lead to the same behavior as my setting MALLOC_LIMIT, however it is more correct way to get this effect =) I agree that we shouldn't worry about mapped files. -- Best Regards, Victor From: Eric Wong [normalper...@yhbt.net] Sent: Wednesday, September 23, 2015 12:22 PM To: Victor Leschuk Cc: Junio C Hamano; git@vger.kernel.org Subject: Re: [PATCH] git-svn: make batch mode optional for git-cat-file Victor Leschuk wrote: > Hello Eric, thanks for looking into it. > > >> git-cat-file has outgrown the parent perl process several times > >> (git-cat-file - ~3-4Gb, perl - 400Mb). > > > Ugh, that sucks. > > Even the 400Mb size of Perl annoys me greatly and I'd work > > on fixing it if I had more time. > > I was going to look at this problem also, but first I'd like to improve the > situation with cat-file as on large repos it is larger problem. By the way, > what direction would you suggest to begin with? See below :) > > > git-cat-file has outgrown the parent perl process several times > > > (git-cat-file - ~3-4Gb, perl - 400Mb). > > > How much of that is anonymous memory, though? > > Haven't measured on this particular repo: didn't redo the 2 week > experiment =) However I checked on a smaller test repo and anon memory > is about 12M out of 40M total. Most of memory is really taken by > mmaped *.pack and *idx files. If it's mmap-ed files, that physical memory is only used on-demand and can be dropped at any time because it's backed by disk. In other words, I would not worry about any file-backed mmap at all (unless you're on 32-bit, but I think git has workarounds for that) Do you still have that giant repo around? Are the combined size of the pack + idx files are at least 3-4 GB? This should cat all the blobs in history without re-running git-svn: git log --all --raw -r --no-abbrev | \ awk '/^:/ {print $3; print $4}' | git cat-file --batch git log actually keeps growing, but the cat-file process shouldn't use anonymous memory much if you inspect it with pmap. > Actually I accidentally found out that if I export GIT_MALLOC_LIMIT > variable set to several megabytes it has the following effect: > * git-svn.perl launches git-gc > * git-gc can't allocate enough memory and thus doesn't create any pack files > * git-cat-file works only with pure blob object, not packs, and it's > memory usage doesn't grow larger than 4-5M > > It gave me a thought that maybe we could get rid of "git gc" calls > after each commit in perl code and just perform one large gc operation > at the end. It will cost disk space during clone but save us memory. > What do you think? You can set gc.auto to zero in your $GIT_CONFIG to disable gc. The "git gc" calls were added because unpacked repos were growing too large and caused problems for other people. Perhaps play with some other pack* options documented in Documentation/config to limit maximum pack size/depth. Is this a 32-bit or 64-bit system? > As for your suggestion regarding periodic restart of batch process > inside git-cat-file, I think we could give it a try, I can prepare a > patch and run some tests. I am not sure if we need it for git-svn. In another project, the only reason I've found to restart "cat-file --batch" is in case the repo got repacked and old packs got unlinked, cat-file would hold a reference onto the old file and suck up space. It might be better if "cat-file --batch" learned to detect unlinked files and then munmap + close them. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] git-svn: make batch mode optional for git-cat-file
Hello Eric, thanks for looking into it. >> git-cat-file has outgrown the parent perl process several times >> (git-cat-file - ~3-4Gb, perl - 400Mb). > Ugh, that sucks. > Even the 400Mb size of Perl annoys me greatly and I'd work > on fixing it if I had more time. I was going to look at this problem also, but first I'd like to improve the situation with cat-file as on large repos it is larger problem. By the way, what direction would you suggest to begin with? > A few more questions: > * What is the largest file that existed in that repo? About 2.5M > * Did you try "MALLOC_MMAP_THRESHOLD_" with glibc? Have just tried it on a smaller repo (which takes about 1 hour to clone and RSS grows from 4M to 40M during the process. Unfortunately there is no much of an effect: max RSS is 41M with default settings and 38M with MALLOC_MMAP_THRESHOLD_=131072. > If alloc.c is the culprit, I would consider to transparently restart "cat-file --batch" once it grows to a certain size or after a certain number of requests are made to it. alloc.c interface is not used in cat-file at all, only direct calls to xmalloc and xrealloc from wrapper.c, and also xmmap() from sha1_file.c. > > git-cat-file has outgrown the parent perl process several times > > (git-cat-file - ~3-4Gb, perl - 400Mb). > How much of that is anonymous memory, though? Haven't measured on this particular repo: didn't redo the 2 week experiment =) However I checked on a smaller test repo and anon memory is about 12M out of 40M total. Most of memory is really taken by mmaped *.pack and *idx files. Actually I accidentally found out that if I export GIT_MALLOC_LIMIT variable set to several megabytes it has the following effect: * git-svn.perl launches git-gc * git-gc can't allocate enough memory and thus doesn't create any pack files * git-cat-file works only with pure blob object, not packs, and it's memory usage doesn't grow larger than 4-5M It gave me a thought that maybe we could get rid of "git gc" calls after each commit in perl code and just perform one large gc operation at the end. It will cost disk space during clone but save us memory. What do you think? As for your suggestion regarding periodic restart of batch process inside git-cat-file, I think we could give it a try, I can prepare a patch and run some tests. -- Best Regards, Victor ________ From: Eric Wong [normalper...@yhbt.net] Sent: Tuesday, September 22, 2015 5:35 PM To: Victor Leschuk Cc: Junio C Hamano; git@vger.kernel.org Subject: Re: [PATCH] git-svn: make batch mode optional for git-cat-file Eric Wong wrote: > Victor Leschuk wrote: > > The thing is that git-cat-file keeps growing during work when running > > in "batch" mode. See the figure attached: it is for cloning a rather > > small repo (1 hour to clone about ~14000 revisions). However the clone > > of a large repo (~28 revisions) took about 2 weeks and > > git-cat-file has outgrown the parent perl process several times > > (git-cat-file - ~3-4Gb, perl - 400Mb). How much of that is anonymous memory, though? (pmap $PID_OF_GIT_CAT_FILE) Running the following on the Linux kernel tree I had lying around: (for i in $(seq 100 200); do git ls-files | sed -e "s/^/HEAD~$i:/"; done)|\ git cat-file --batch >/dev/null Reveals about 510M RSS in top, but pmap says less than 20M of that is anonymous. So the rest are mmap-ed packfiles; that RSS gets transparently released back to the kernel under memory pressure. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] git-svn: make batch mode optional for git-cat-file
As for your remark regarding the option naming: > An option whose name begins with no- looks somewhat strange. You can even say --no-no-cat-file-batch from the command line, I suspect. We already do have some of these: 'no-metadata', 'no-checkout', 'no-auth-cache'. So I was just following the existing convention. Do you think we need to change it and stick with --catch-file-batch=1/--cat-file-batch=0 ? -- Best Regards, Victor ________ From: Victor Leschuk Sent: Monday, September 21, 2015 3:03 PM To: Junio C Hamano Cc: git@vger.kernel.org Subject: RE: [PATCH] git-svn: make batch mode optional for git-cat-file Hello Junio, thanks for your review. First of all I'd like to apologize for sending the patch without description. Actually I was in a hurry and sent it by accident: I planned to edit the mail before sending... Here is the detailed description: Last week we had a quick discussion in this mailing list: http://thread.gmane.org/gmane.comp.version-control.git/278021 . The thing is that git-cat-file keeps growing during work when running in "batch" mode. See the figure attached: it is for cloning a rather small repo (1 hour to clone about ~14000 revisions). However the clone of a large repo (~28 revisions) took about 2 weeks and git-cat-file has outgrown the parent perl process several times (git-cat-file - ~3-4Gb, perl - 400Mb). What was done: * I have run it under valgrind and mtrace and haven't found any memory leaks * Found the source of most number of memory reallocations (batch_object_write() function (strbuf_expand -> realloc)) - tried to make the streambuf object static and avoid reallocs - didn't help * Tried preloading other allocators than standard glibc - no significant difference After that I replaced the batch mode with separate cat-file calls for each blob and it didn't have any impact on clone performance on real code repositories. However I created a fake test repo with large number of small files (~10 bytes each): here is how I created it https://bitbucket.org/vleschuk/svngenrepo And on this artificial test repo it really slowed down the process. So I decided to suggest to make the batch mode optional to let the user "tune" the process and created a patch for this. As for your code-style notes, I agree with them, will adjust the code. -- Best Regards, Victor From: Junio C Hamano [jch2...@gmail.com] On Behalf Of Junio C Hamano [gits...@pobox.com] Sent: Monday, September 21, 2015 11:25 AM To: Victor Leschuk Cc: git@vger.kernel.org; Victor Leschuk Subject: Re: [PATCH] git-svn: make batch mode optional for git-cat-file Victor Leschuk writes: > Signed-off-by: Victor Leschuk > --- Before the S-o-b line is a good place to explain why this is a good change to have. Please use it. > git-svn.perl | 1 + > perl/Git.pm | 41 - > 2 files changed, 41 insertions(+), 1 deletion(-) > > diff --git a/git-svn.perl b/git-svn.perl > index 36f7240..b793c26 100755 > --- a/git-svn.perl > +++ b/git-svn.perl > @@ -139,6 +139,7 @@ my %fc_opts = ( 'follow-parent|follow!' => > \$Git::SVN::_follow_parent, > 'use-log-author' => \$Git::SVN::_use_log_author, > 'add-author-from' => \$Git::SVN::_add_author_from, > 'localtime' => \$Git::SVN::_localtime, > + 'no-cat-file-batch' => sub { $Git::no_cat_file_batch = 1; }, An option whose name begins with no- looks somewhat strange. You can even say --no-no-cat-file-batch from the command line, I suspect. Why not give an option 'cat-file-batch' that sets the variable $Git::cat_file_batch to false, and initialize the variable to true to keep existing users who do not pass the option happy? > %remote_opts ); > > my ($_trunk, @_tags, @_branches, $_stdlayout); > diff --git a/perl/Git.pm b/perl/Git.pm > index 19ef081..69e5293 100644 > --- a/perl/Git.pm > +++ b/perl/Git.pm > @@ -107,6 +107,7 @@ use Fcntl qw(SEEK_SET SEEK_CUR); > use Time::Local qw(timegm); > } > > +our $no_cat_file_batch = 0; > > =head1 CONSTRUCTORS > > @@ -1012,6 +1013,10 @@ returns the number of bytes printed. > =cut > > sub cat_blob { > + (1 == $no_cat_file_batch) ? _cat_blob_cmd(@_) : _cat_blob_batch(@_); Discard "1 ==" here. You are clearly using the variable as a boolean, so writing this as $cat_file_batch ? _cat_blob_batch(@_) : _cat_blob_cmd(@_); or better yet if ($cat_file_batch) { _cat_blob_batch(@_); } else { _cat_blob_cmd(@_); } would be more natural. > +} > + > +sub _cat_blob_batch { > my ($self, $sha1, $f
git-svn: Why not use git-fast-import?
Hello all, I've been playing with git-svn for some time already and as it seems to me there are two most important problems which make it hard to use in production for large repositories. Very low speed and large memory footprint of synchronization with SVN repos (I am talking about clone and fetch operations mostly). I was wondering why the git-fast-import is not used for these purposes? Are there any serious limitations which make it impossible? I have found several alternative solutions which use git-fast-import but they all do only the initial import of a repo. I have looked through the documentation and didn't see why fast-import can't be used to sync an existing repo after the import. Am I missing something? Thanks in advance for clarification. -- Best Regards, Victor-- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] git-svn: make batch mode optional for git-cat-file
Hello Junio, thanks for your review. First of all I'd like to apologize for sending the patch without description. Actually I was in a hurry and sent it by accident: I planned to edit the mail before sending... Here is the detailed description: Last week we had a quick discussion in this mailing list: http://thread.gmane.org/gmane.comp.version-control.git/278021 . The thing is that git-cat-file keeps growing during work when running in "batch" mode. See the figure attached: it is for cloning a rather small repo (1 hour to clone about ~14000 revisions). However the clone of a large repo (~28 revisions) took about 2 weeks and git-cat-file has outgrown the parent perl process several times (git-cat-file - ~3-4Gb, perl - 400Mb). What was done: * I have run it under valgrind and mtrace and haven't found any memory leaks * Found the source of most number of memory reallocations (batch_object_write() function (strbuf_expand -> realloc)) - tried to make the streambuf object static and avoid reallocs - didn't help * Tried preloading other allocators than standard glibc - no significant difference After that I replaced the batch mode with separate cat-file calls for each blob and it didn't have any impact on clone performance on real code repositories. However I created a fake test repo with large number of small files (~10 bytes each): here is how I created it https://bitbucket.org/vleschuk/svngenrepo And on this artificial test repo it really slowed down the process. So I decided to suggest to make the batch mode optional to let the user "tune" the process and created a patch for this. As for your code-style notes, I agree with them, will adjust the code. -- Best Regards, Victor From: Junio C Hamano [jch2...@gmail.com] On Behalf Of Junio C Hamano [gits...@pobox.com] Sent: Monday, September 21, 2015 11:25 AM To: Victor Leschuk Cc: git@vger.kernel.org; Victor Leschuk Subject: Re: [PATCH] git-svn: make batch mode optional for git-cat-file Victor Leschuk writes: > Signed-off-by: Victor Leschuk > --- Before the S-o-b line is a good place to explain why this is a good change to have. Please use it. > git-svn.perl | 1 + > perl/Git.pm | 41 - > 2 files changed, 41 insertions(+), 1 deletion(-) > > diff --git a/git-svn.perl b/git-svn.perl > index 36f7240..b793c26 100755 > --- a/git-svn.perl > +++ b/git-svn.perl > @@ -139,6 +139,7 @@ my %fc_opts = ( 'follow-parent|follow!' => > \$Git::SVN::_follow_parent, > 'use-log-author' => \$Git::SVN::_use_log_author, > 'add-author-from' => \$Git::SVN::_add_author_from, > 'localtime' => \$Git::SVN::_localtime, > + 'no-cat-file-batch' => sub { $Git::no_cat_file_batch = 1; }, An option whose name begins with no- looks somewhat strange. You can even say --no-no-cat-file-batch from the command line, I suspect. Why not give an option 'cat-file-batch' that sets the variable $Git::cat_file_batch to false, and initialize the variable to true to keep existing users who do not pass the option happy? > %remote_opts ); > > my ($_trunk, @_tags, @_branches, $_stdlayout); > diff --git a/perl/Git.pm b/perl/Git.pm > index 19ef081..69e5293 100644 > --- a/perl/Git.pm > +++ b/perl/Git.pm > @@ -107,6 +107,7 @@ use Fcntl qw(SEEK_SET SEEK_CUR); > use Time::Local qw(timegm); > } > > +our $no_cat_file_batch = 0; > > =head1 CONSTRUCTORS > > @@ -1012,6 +1013,10 @@ returns the number of bytes printed. > =cut > > sub cat_blob { > + (1 == $no_cat_file_batch) ? _cat_blob_cmd(@_) : _cat_blob_batch(@_); Discard "1 ==" here. You are clearly using the variable as a boolean, so writing this as $cat_file_batch ? _cat_blob_batch(@_) : _cat_blob_cmd(@_); or better yet if ($cat_file_batch) { _cat_blob_batch(@_); } else { _cat_blob_cmd(@_); } would be more natural. > +} > + > +sub _cat_blob_batch { > my ($self, $sha1, $fh) = @_; > > $self->_open_cat_blob_if_needed(); > @@ -1072,7 +1077,7 @@ sub cat_blob { > sub _open_cat_blob_if_needed { > my ($self) = @_; > > - return if defined($self->{cat_blob_pid}); > + return if ( defined($self->{cat_blob_pid}) || 1 == $no_cat_file_batch ); Likewise. return if (!$cat_file_batch); return if defined($self->{cat_blob_pid}); > +sub _cat_blob_cmd { > + my ($self, $sha1, $fh) = @_; > +... The biggest thing that is missing from this patch is the explanation of why this is a good thing to do. The batch interface was invented because people found that it was wasteful to spawn
[PATCH] git-svn: make batch mode optional for git-cat-file
Signed-off-by: Victor Leschuk --- git-svn.perl | 1 + perl/Git.pm | 41 - 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/git-svn.perl b/git-svn.perl index 36f7240..b793c26 100755 --- a/git-svn.perl +++ b/git-svn.perl @@ -139,6 +139,7 @@ my %fc_opts = ( 'follow-parent|follow!' => \$Git::SVN::_follow_parent, 'use-log-author' => \$Git::SVN::_use_log_author, 'add-author-from' => \$Git::SVN::_add_author_from, 'localtime' => \$Git::SVN::_localtime, + 'no-cat-file-batch' => sub { $Git::no_cat_file_batch = 1; }, %remote_opts ); my ($_trunk, @_tags, @_branches, $_stdlayout); diff --git a/perl/Git.pm b/perl/Git.pm index 19ef081..69e5293 100644 --- a/perl/Git.pm +++ b/perl/Git.pm @@ -107,6 +107,7 @@ use Fcntl qw(SEEK_SET SEEK_CUR); use Time::Local qw(timegm); } +our $no_cat_file_batch = 0; =head1 CONSTRUCTORS @@ -1012,6 +1013,10 @@ returns the number of bytes printed. =cut sub cat_blob { + (1 == $no_cat_file_batch) ? _cat_blob_cmd(@_) : _cat_blob_batch(@_); +} + +sub _cat_blob_batch { my ($self, $sha1, $fh) = @_; $self->_open_cat_blob_if_needed(); @@ -1072,7 +1077,7 @@ sub cat_blob { sub _open_cat_blob_if_needed { my ($self) = @_; - return if defined($self->{cat_blob_pid}); + return if ( defined($self->{cat_blob_pid}) || 1 == $no_cat_file_batch ); ($self->{cat_blob_pid}, $self->{cat_blob_in}, $self->{cat_blob_out}, $self->{cat_blob_ctx}) = @@ -1090,6 +1095,40 @@ sub _close_cat_blob { delete @$self{@vars}; } +sub _cat_blob_cmd { + my ($self, $sha1, $fh) = @_; + + my $size = $self->command_oneline('cat-file', '-s', $sha1); + + if (!defined $size) { + carp "cat-file couldn't detect object size"; + return -1; + } + + my ($in, $c) = $self->command_output_pipe('cat-file', 'blob', $sha1); + + my $blob; + my $bytesLeft = $size; + + while (1) { + last unless $bytesLeft; + + my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024; + my $read = read($in, $blob, $bytesToRead); + unless (defined($read)) { + $self->command_close_pipe($in, $c); + throw Error::Simple("in pipe went bad"); + } + unless (print $fh $blob) { + $self->command_close_pipe($in, $c); + throw Error::Simple("couldn't write to passed in filehandle"); + } + $bytesLeft -= $read; + } + + $self->command_close_pipe($in, $c); + return $size; +} =item credential_read( FILEHANDLE )
RE: git-svn: cat-file memory usage
Hello Jeff, thanks for the advice. Unfortunately using patch didn't change the situation. I will run some tests with alternate allocators (looking at jemalloc and tcmalloc). As for alternate tools: as far as I understood svn2git calls 'git svn' itself. So I assume it can't fix the memory usage or speed up clone process... Correct me if I'm wrong. Reposurgeon looks interesting... Will give it a try. Btw, what do you think of getting rid of batch mode for clone/fetch in perl code. It really hardly has any impact on performance but reduces memory usage a lot. -- Best Regards, Victor From: Jeff King [p...@peff.net] Sent: Wednesday, September 16, 2015 4:56 AM To: Victor Leschuk Cc: git@vger.kernel.org Subject: Re: git-svn: cat-file memory usage On Wed, Sep 16, 2015 at 04:00:48AM -0700, Victor Leschuk wrote: > * git svn clone of trac takes about 1 hour > * git svn clone of FreeBSD has already taken more than 3 days and > still running (currently has cloned about 40% of revisions) I haven't worked with git-svn in a long time, but I doubt that it is the fastest way to do a large repository import. You might want to look into a tool like svn2git or reposurgeon to do the initial import. > I have valgrind'ed the git-cat-file (which is running is --batch mode > during the whole clone) and found no serious leaks (about 100 bytes > definitely leaked), so all memory is carefully freed, but the heap > usage grows maybe due to fragmentation or smth else. When I looked > through the code I found out that most of heap allocations are called > from batch_object_write() function (strbuf_expand -> realloc). Certainly we will call strbuf_expand once per object. I would have expected we would call read_sha1_file(), too. It looks like we always try to stream blobs, but I think we have to fall back to reading the whole object if there are deltas. You can try this patch, which will reuse the same strbuf over and over: diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 07baad1..73f338c 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -256,7 +256,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d static void batch_object_write(const char *obj_name, struct batch_options *opt, struct expand_data *data) { - struct strbuf buf = STRBUF_INIT; + static struct strbuf buf = STRBUF_INIT; if (sha1_object_info_extended(data->sha1, &data->info, LOOKUP_REPLACE_OBJECT) < 0) { printf("%s missing\n", obj_name ? obj_name : sha1_to_hex(data->sha1)); @@ -264,10 +264,10 @@ static void batch_object_write(const char *obj_name, struct batch_options *opt, return; } + strbuf_reset(&buf); strbuf_expand(&buf, opt->format, expand_format, data); strbuf_addch(&buf, '\n'); batch_write(opt, buf.buf, buf.len); - strbuf_release(&buf); if (opt->print_contents) { print_object_or_die(opt, data); That will reduce your reallocs due to strbuf_expand, though I'm doubtful that it will solve the problem (and if it does, I think the right solution is probably to look into using a better allocator than what your system malloc() is providing). > * In perl code do not run git cat-file in batch mode (in > Git::SVN::apply_textdelta) but rather run it as separate commands > each time > >my $size = $self->command_oneline('cat-file', '-s', $sha1); ># . >my ($in, $c) = $self->command_output_pipe('cat-file', 'blob', $sha1); > > The second approach doesn't slow down the whole process at all (~72 > minutes to clone repo both with --batch mode and without). I'm surprised the startup cost of the process doesn't make an impact, but maybe it gets lost in the noise of the rest of the work (AFAICT, the point of this cat-file is to retrieve a blob, apply a delta to it, and then write out the resulting object; that write is probably a lot more expensive). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git-svn: cat-file memory usage
Hello all, We are currently getting acquainted with git-svn tool and have experienced few problems with it. The main issue is memory usage during "git svn clone": on large repositories the perl and git processes are using significant amount of memory. I have conducted several tests with different repositories. I have created mirrors of Trac project (http://trac.edgewall.org/ - rather small repo, ~14000 commits) and FreeBSD base repo (~28 commits). Here is the summary of my tests (to eliminate network issues all clones were performed for file:// repos): * git svn clone of trac takes about 1 hour * git svn clone of FreeBSD has already taken more than 3 days and still running (currently has cloned about 40% of revisions) * git cat-file process memory footprint keeps growing during the clone process (see figure attached) The main issue here is git cat-file consuming memory. The attached figure is for small repository which takes about an hour to clone, however on my another machine where FreeBSD clone is currently running the git cat-file has already taken more than 1Gb of memory (RSS) and has overgrown the parent perl process (~300-400 Mb). I have valgrind'ed the git-cat-file (which is running is --batch mode during the whole clone) and found no serious leaks (about 100 bytes definitely leaked), so all memory is carefully freed, but the heap usage grows maybe due to fragmentation or smth else. When I looked through the code I found out that most of heap allocations are called from batch_object_write() function (strbuf_expand -> realloc). So I have found two possible workarounds for the issue: * Set GIT_ALLOC_LIMIT variable - it does reduce the memory footprint but slows down the process * In perl code do not run git cat-file in batch mode (in Git::SVN::apply_textdelta) but rather run it as separate commands each time my $size = $self->command_oneline('cat-file', '-s', $sha1); # . my ($in, $c) = $self->command_output_pipe('cat-file', 'blob', $sha1); The second approach doesn't slow down the whole process at all (~72 minutes to clone repo both with --batch mode and without). So the question is: what would be the correct approach to fight the problem with cat-file memory usage: maybe we should get rid of batch mode in perl code, or somehow tune allocation policy in C code? Please let me know your thoughts. -- Best Regards, Victor Leschuk