RE: [PATCH v3] Add git-grep threads param
Hello Linus, >> According to several tests on systems with different number of CPU cores >> the hard-coded number of 8 threads is not optimal for all systems: > Did you also compare cold-cache filesystem performance? > One of the reasons for doing threaded grep is for CPU scaling. But another > is for IO scaling. If your git tree is over NFS, doing grep eight threads at > a time if likely going to make things much faster even if you are on a single > CPU. Yes, I have performed tests on cold-cache FS and it looks like number of threads affects performance. Here are the results for grepping linux kernel repo on a 4-core machine (similar test was conducted on 8-core machine): Threads: 4 Time: 39.13 Threads: 8 Time: 34.39 Threads: 16 Time: 31.46 Threads: 32 Time: 27.40 Here is test scenario: #!/bin/bash TIMEFORMAT=%R GIT=/home/del/git-dev/bin/git TESTS=10 for n in 4 8 16 32; do echo -n "Threads: $n Time: " for i in $(seq 1 $TESTS); do echo 3 > /proc/sys/vm/drop_caches time $GIT grep --threads $n -e '#define' --and \( -e MAX_PATH -e PATH_MAX \) >/dev/null done 2>&1 | awk -v ntests=${TESTS} '{sum+=$1} END{printf "%.2f\n", sum/ntests}' done Note: With hot-cache grepping with 4 threads gives fastest results on both 4-core and 8-core machines. Thus I think it can be useful for users to be able to tune the threads number according to their needs. -- Victor-- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Add git-grep threads param
On Mon, Oct 26, 2015 at 10:25:41PM -0700, Victor Leschuk wrote: > >> @@ -22,6 +22,7 @@ SYNOPSIS > >> [--color[=] | --no-color] > >> [--break] [--heading] [-p | --show-function] > >> [-A ] [-B ] [-C ] > >> +[--threads ] > > > Is this the best place for this option? I know the current list isn't > > sorted in any particular way, but here you're splitting up the set of > > context options (`-A`, `-B`, `-C` and `-W`). > > Agree, I'll move the option both here and in documentation. > > >> -static int wait_all(void) > >> +static int wait_all(struct grep_opt *opt) > > > I'm not sure passing a grep_opt in here is the cleanest way to do this. > > Options are a UI concept and all we care about here is the number of > > threads. > > > Since `threads` is a global, shouldn't the number of threads be a global > > as well? Could we reuse `use_threads` here (possibly renaming it > > `num_threads`)? > > This thought also crossed my mind, however we already pass grep_opt to > start_threads() function, so I think passing it to wait_all() is not > that ugly, and kind of symmetric. And I do not like the idea of > duplicating same information in different places. What do you think? The grep_opt in start_threads() is being passed through to run(), so it seems slightly different to me. If the threads were being setup in grep.c (as opposed to builtin/grep.c) then I'd agree that it belongs in grep_opt, but since this is local to this particular user of the grep infrastructure adding num_threads to the grep_opt structure at all feels wrong to me. Note that I wasn't suggesting passing num_threads as a parameter to wait_all(), but rather having it as global state that is accessed by wait_all() in the same way as the `threads` array. If we rename use_threads to num_threads and just use that, then we only have the information in one place don't we? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3] Add git-grep threads param
Hello John, >> This thought also crossed my mind, however we already pass grep_opt to >> start_threads() function, so I think passing it to wait_all() is not >> that ugly, and kind of symmetric. And I do not like the idea of >> duplicating same information in different places. What do you think? > The grep_opt in start_threads() is being passed through to run(), so it > seems slightly different to me. If the threads were being setup in > grep.c (as opposed to builtin/grep.c) then I'd agree that it belongs in > grep_opt, but since this is local to this particular user of the grep > infrastructure adding num_threads to the grep_opt structure at all feels > wrong to me. > Note that I wasn't suggesting passing num_threads as a parameter to > wait_all(), but rather having it as global state that is accessed by > wait_all() in the same way as the `threads` array. > If we rename use_threads to num_threads and just use that, then we only > have the information in one place don't we? Yeah, I understood your idea. So we parse config_value directly to static int num_threads; /* old use_threads */ And use it internally in builtin/grep.c. I think you are right. Looks like grep_cmd_config() is the right place to parse it. Something like: --- a/builtin/grep.c +++ b/builtin/grep.c @@ -267,6 +267,8 @@ static int wait_all(struct grep_opt *opt) static int grep_cmd_config(const char *var, const char *value, void *cb) { int st = grep_config(var, value, cb); + if (thread_config(var, value, cb) < 0) + st = -1; if (git_color_default_config(var, value, cb) < 0) st = -1; return st; What do you think? -- Best Regards, Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Add git-grep threads param
On Tue, Oct 27, 2015 at 06:54:16AM -0700, Victor Leschuk wrote: > Hello John, > > >> This thought also crossed my mind, however we already pass grep_opt to > >> start_threads() function, so I think passing it to wait_all() is not > >> that ugly, and kind of symmetric. And I do not like the idea of > >> duplicating same information in different places. What do you think? > > > The grep_opt in start_threads() is being passed through to run(), so it > > seems slightly different to me. If the threads were being setup in > > grep.c (as opposed to builtin/grep.c) then I'd agree that it belongs in > > grep_opt, but since this is local to this particular user of the grep > > infrastructure adding num_threads to the grep_opt structure at all feels > > wrong to me. > > > Note that I wasn't suggesting passing num_threads as a parameter to > > wait_all(), but rather having it as global state that is accessed by > > wait_all() in the same way as the `threads` array. > > > If we rename use_threads to num_threads and just use that, then we only > > have the information in one place don't we? > > Yeah, I understood your idea. So we parse config_value directly to > > static int num_threads; /* old use_threads */ Presumably this is: static int num_threads = -1; so that the default behaviour continues to work correctly. > And use it internally in builtin/grep.c. I think you are right. > > Looks like grep_cmd_config() is the right place to parse it. Something like: > > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -267,6 +267,8 @@ static int wait_all(struct grep_opt *opt) > static int grep_cmd_config(const char *var, const char *value, void *cb) > { > int st = grep_config(var, value, cb); > + if (thread_config(var, value, cb) < 0) > + st = -1; > if (git_color_default_config(var, value, cb) < 0) > st = -1; > return st; > > What do you think? I'd be tempted to open code the "grep.threads" case in this function rather than introducing a helper for a single variable, but I don't think it matters either way. This looks good. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Add git-grep threads param
On Mon, Oct 26, 2015 at 03:32:13PM +0300, Victor Leschuk wrote: > Make number of git-grep worker threads a configuration parameter. > According to several tests on systems with different number of CPU cores > the hard-coded number of 8 threads is not optimal for all systems: > tuning this parameter can significantly speed up grep performance. > > Signed-off-by: Victor Leschuk> --- > Documentation/config.txt | 4 > Documentation/git-grep.txt | 4 > builtin/grep.c | 34 > ++ > contrib/completion/git-completion.bash | 1 + > grep.c | 10 ++ > grep.h | 2 ++ > 6 files changed, 47 insertions(+), 8 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 391a0c3..1c95587 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -1447,6 +1447,10 @@ grep.extendedRegexp:: > option is ignored when the 'grep.patternType' option is set to a value > other than 'default'. > > +grep.threads:: > + Number of grep worker threads, use it to tune up performance on > + multicore machines. Default value is 8. > + > gpg.program:: > Use this custom program instead of "gpg" found on $PATH when > making or verifying a PGP signature. The program must support the > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index 4a44d6d..fbd4f83 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -22,6 +22,7 @@ SYNOPSIS > [--color[=] | --no-color] > [--break] [--heading] [-p | --show-function] > [-A ] [-B ] [-C ] > +[--threads ] Is this the best place for this option? I know the current list isn't sorted in any particular way, but here you're splitting up the set of context options (`-A`, `-B`, `-C` and `-W`). > [-W | --function-context] > [-f ] [-e] > [--and|--or|--not|(|)|-e ...] > @@ -220,6 +221,9 @@ OPTIONS > Show leading lines, and place a line containing > `--` between contiguous groups of matches. > > +--threads :: > + Set number of worker threads to . Default is 8. The same comment as above applies here. > -W:: > --function-context:: > Show the surrounding text from the previous line containing a > diff --git a/builtin/grep.c b/builtin/grep.c > index d04f440..5ef1b07 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -27,8 +27,7 @@ static char const * const grep_usage[] = { > static int use_threads = 1; > > #ifndef NO_PTHREADS > -#define THREADS 8 > -static pthread_t threads[THREADS]; > +static pthread_t *threads; > > /* We use one producer thread and THREADS consumer > * threads. The producer adds struct work_items to 'todo' and the > @@ -206,7 +205,8 @@ static void start_threads(struct grep_opt *opt) > strbuf_init([i].out, 0); > } > > - for (i = 0; i < ARRAY_SIZE(threads); i++) { > + threads = xcalloc(opt->num_threads, sizeof(pthread_t)); > + for (i = 0; i < opt->num_threads; i++) { > int err; > struct grep_opt *o = grep_opt_dup(opt); > o->output = strbuf_out; > @@ -220,7 +220,7 @@ static void start_threads(struct grep_opt *opt) > } > } > > -static int wait_all(void) > +static int wait_all(struct grep_opt *opt) I'm not sure passing a grep_opt in here is the cleanest way to do this. Options are a UI concept and all we care about here is the number of threads. Since `threads` is a global, shouldn't the number of threads be a global as well? Could we reuse `use_threads` here (possibly renaming it `num_threads`)? > { > int hit = 0; > int i; > @@ -238,12 +238,14 @@ static int wait_all(void) > pthread_cond_broadcast(_add); > grep_unlock(); > > - for (i = 0; i < ARRAY_SIZE(threads); i++) { > + for (i = 0; i < opt->num_threads; i++) { > void *h; > pthread_join(threads[i], ); > hit |= (int) (intptr_t) h; > } > > + free(threads); > + > pthread_mutex_destroy(_mutex); > pthread_mutex_destroy(_read_mutex); > pthread_mutex_destroy(_attr_mutex); > @@ -256,7 +258,7 @@ static int wait_all(void) > } > #else /* !NO_PTHREADS */ > > -static int wait_all(void) > +static int wait_all(struct grep_opt *opt) > { > return 0; > } > @@ -702,6 +704,8 @@ int cmd_grep(int argc, const char **argv, const char > *prefix) > N_("show context lines before matches")), > OPT_INTEGER('A', "after-context", _context, > N_("show context lines after matches")), > + OPT_INTEGER(0, "threads", _threads, > + N_("use worker threads")), > OPT_NUMBER_CALLBACK(, N_("shortcut for -C NUM"), > context_callback),
RE: [PATCH v3] Add git-grep threads param
Hello John, see comments inline. >> @@ -22,6 +22,7 @@ SYNOPSIS >> [--color[=] | --no-color] >> [--break] [--heading] [-p | --show-function] >> [-A ] [-B ] [-C ] >> +[--threads ] > Is this the best place for this option? I know the current list isn't > sorted in any particular way, but here you're splitting up the set of > context options (`-A`, `-B`, `-C` and `-W`). Agree, I'll move the option both here and in documentation. >> -static int wait_all(void) >> +static int wait_all(struct grep_opt *opt) > I'm not sure passing a grep_opt in here is the cleanest way to do this. > Options are a UI concept and all we care about here is the number of > threads. > Since `threads` is a global, shouldn't the number of threads be a global > as well? Could we reuse `use_threads` here (possibly renaming it > `num_threads`)? This thought also crossed my mind, however we already pass grep_opt to start_threads() function, so I think passing it to wait_all() is not that ugly, and kind of symmetric. And I do not like the idea of duplicating same information in different places. What do you think? -- Best Regards, Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html