RE: [PATCH v3] Add git-grep threads param

2015-10-27 Thread Victor Leschuk
Hello Linus, 

>> According to several tests on systems with different number of CPU cores
>> the hard-coded number of 8 threads is not optimal for all systems:

> Did you also compare cold-cache filesystem performance?

>  One of the reasons for doing threaded grep is for CPU scaling. But another 
> is for IO scaling. If your git tree is over NFS, doing grep eight threads at 
> a time if likely going to make things much faster even if you are on a single 
> CPU.

Yes, I have performed tests on cold-cache FS and it looks like number of 
threads affects performance. Here are the results for grepping linux kernel 
repo on a 4-core machine (similar test was conducted on 8-core machine):

Threads: 4 Time: 39.13
Threads: 8 Time: 34.39
Threads: 16 Time: 31.46
Threads: 32 Time: 27.40

Here is test scenario:

#!/bin/bash
TIMEFORMAT=%R
GIT=/home/del/git-dev/bin/git
TESTS=10
for n in 4 8 16 32; do
echo -n "Threads: $n Time: "
for i in $(seq 1 $TESTS); do
echo 3 > /proc/sys/vm/drop_caches
time $GIT grep --threads $n -e '#define' --and \( -e MAX_PATH -e 
PATH_MAX \)  >/dev/null
done 2>&1 | awk -v ntests=${TESTS} '{sum+=$1} END{printf "%.2f\n", 
sum/ntests}'
done

Note: With hot-cache grepping with 4 threads gives fastest results on both 
4-core and 8-core machines.

Thus I think it can be useful for users to be able to tune the threads number 
according to their needs.

--
Victor--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Add git-grep threads param

2015-10-27 Thread John Keeping
On Mon, Oct 26, 2015 at 10:25:41PM -0700, Victor Leschuk wrote:
> >> @@ -22,6 +22,7 @@ SYNOPSIS
> >>  [--color[=] | --no-color]
> >>  [--break] [--heading] [-p | --show-function]
> >>  [-A ] [-B ] [-C ]
> >> +[--threads ]
> 
> > Is this the best place for this option?  I know the current list isn't
> > sorted in any particular way, but here you're splitting up the set of
> > context options (`-A`, `-B`, `-C` and `-W`).
> 
> Agree, I'll move the option both here and in documentation.
> 
> >> -static int wait_all(void)
> >> +static int wait_all(struct grep_opt *opt)
> 
> > I'm not sure passing a grep_opt in here is the cleanest way to do this.
> > Options are a UI concept and all we care about here is the number of
> > threads.
> 
> > Since `threads` is a global, shouldn't the number of threads be a global
> > as well?  Could we reuse `use_threads` here (possibly renaming it
> > `num_threads`)?
> 
> This thought also crossed my mind, however we already pass grep_opt to
> start_threads() function, so I think passing it to wait_all() is not
> that ugly, and kind of symmetric. And I do not like the idea of
> duplicating same information in different places. What do you think?

The grep_opt in start_threads() is being passed through to run(), so it
seems slightly different to me.  If the threads were being setup in
grep.c (as opposed to builtin/grep.c) then I'd agree that it belongs in
grep_opt, but since this is local to this particular user of the grep
infrastructure adding num_threads to the grep_opt structure at all feels
wrong to me.

Note that I wasn't suggesting passing num_threads as a parameter to
wait_all(), but rather having it as global state that is accessed by
wait_all() in the same way as the `threads` array.

If we rename use_threads to num_threads and just use that, then we only
have the information in one place don't we?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3] Add git-grep threads param

2015-10-27 Thread Victor Leschuk
Hello John,

>> This thought also crossed my mind, however we already pass grep_opt to
>> start_threads() function, so I think passing it to wait_all() is not
>> that ugly, and kind of symmetric. And I do not like the idea of
>> duplicating same information in different places. What do you think?

> The grep_opt in start_threads() is being passed through to run(), so it
> seems slightly different to me.  If the threads were being setup in
> grep.c (as opposed to builtin/grep.c) then I'd agree that it belongs in
> grep_opt, but since this is local to this particular user of the grep
> infrastructure adding num_threads to the grep_opt structure at all feels
> wrong to me.

> Note that I wasn't suggesting passing num_threads as a parameter to
> wait_all(), but rather having it as global state that is accessed by
> wait_all() in the same way as the `threads` array.

> If we rename use_threads to num_threads and just use that, then we only
> have the information in one place don't we?

Yeah, I understood your idea. So we parse config_value directly to 

static int num_threads; /* old use_threads */

And use it internally in builtin/grep.c. I think you are right.

Looks like grep_cmd_config() is the right place to parse it. Something like:

--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -267,6 +267,8 @@ static int wait_all(struct grep_opt *opt)
 static int grep_cmd_config(const char *var, const char *value, void *cb)
 {
int st = grep_config(var, value, cb);
+   if (thread_config(var, value, cb) < 0)
+   st = -1;
if (git_color_default_config(var, value, cb) < 0)
st = -1;
return st;

What do you think?

--
Best Regards,
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Add git-grep threads param

2015-10-27 Thread John Keeping
On Tue, Oct 27, 2015 at 06:54:16AM -0700, Victor Leschuk wrote:
> Hello John,
> 
> >> This thought also crossed my mind, however we already pass grep_opt to
> >> start_threads() function, so I think passing it to wait_all() is not
> >> that ugly, and kind of symmetric. And I do not like the idea of
> >> duplicating same information in different places. What do you think?
> 
> > The grep_opt in start_threads() is being passed through to run(), so it
> > seems slightly different to me.  If the threads were being setup in
> > grep.c (as opposed to builtin/grep.c) then I'd agree that it belongs in
> > grep_opt, but since this is local to this particular user of the grep
> > infrastructure adding num_threads to the grep_opt structure at all feels
> > wrong to me.
> 
> > Note that I wasn't suggesting passing num_threads as a parameter to
> > wait_all(), but rather having it as global state that is accessed by
> > wait_all() in the same way as the `threads` array.
> 
> > If we rename use_threads to num_threads and just use that, then we only
> > have the information in one place don't we?
> 
> Yeah, I understood your idea. So we parse config_value directly to 
> 
> static int num_threads; /* old use_threads */

Presumably this is:

static int num_threads = -1;

so that the default behaviour continues to work correctly.

> And use it internally in builtin/grep.c. I think you are right.
> 
> Looks like grep_cmd_config() is the right place to parse it. Something like:
> 
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -267,6 +267,8 @@ static int wait_all(struct grep_opt *opt)
>  static int grep_cmd_config(const char *var, const char *value, void *cb)
>  {
> int st = grep_config(var, value, cb);
> +   if (thread_config(var, value, cb) < 0)
> +   st = -1;
> if (git_color_default_config(var, value, cb) < 0)
> st = -1;
> return st;
> 
> What do you think?

I'd be tempted to open code the "grep.threads" case in this function
rather than introducing a helper for a single variable, but I don't
think it matters either way.  This looks good.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Add git-grep threads param

2015-10-26 Thread John Keeping
On Mon, Oct 26, 2015 at 03:32:13PM +0300, Victor Leschuk wrote:
> Make number of git-grep worker threads a configuration parameter.
> According to several tests on systems with different number of CPU cores
> the hard-coded number of 8 threads is not optimal for all systems:
> tuning this parameter can significantly speed up grep performance.
> 
> Signed-off-by: Victor Leschuk 
> ---
>  Documentation/config.txt   |  4 
>  Documentation/git-grep.txt |  4 
>  builtin/grep.c | 34 
> ++
>  contrib/completion/git-completion.bash |  1 +
>  grep.c | 10 ++
>  grep.h |  2 ++
>  6 files changed, 47 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 391a0c3..1c95587 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1447,6 +1447,10 @@ grep.extendedRegexp::
>   option is ignored when the 'grep.patternType' option is set to a value
>   other than 'default'.
>  
> +grep.threads::
> + Number of grep worker threads, use it to tune up performance on
> + multicore machines. Default value is 8.
> +
>  gpg.program::
>   Use this custom program instead of "gpg" found on $PATH when
>   making or verifying a PGP signature. The program must support the
> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 4a44d6d..fbd4f83 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -22,6 +22,7 @@ SYNOPSIS
>  [--color[=] | --no-color]
>  [--break] [--heading] [-p | --show-function]
>  [-A ] [-B ] [-C ]
> +[--threads ]

Is this the best place for this option?  I know the current list isn't
sorted in any particular way, but here you're splitting up the set of
context options (`-A`, `-B`, `-C` and `-W`).

>  [-W | --function-context]
>  [-f ] [-e] 
>  [--and|--or|--not|(|)|-e ...]
> @@ -220,6 +221,9 @@ OPTIONS
>   Show  leading lines, and place a line containing
>   `--` between contiguous groups of matches.
>  
> +--threads ::
> + Set number of worker threads to . Default is 8.

The same comment as above applies here.

>  -W::
>  --function-context::
>   Show the surrounding text from the previous line containing a
> diff --git a/builtin/grep.c b/builtin/grep.c
> index d04f440..5ef1b07 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -27,8 +27,7 @@ static char const * const grep_usage[] = {
>  static int use_threads = 1;
>  
>  #ifndef NO_PTHREADS
> -#define THREADS 8
> -static pthread_t threads[THREADS];
> +static pthread_t *threads;
>  
>  /* We use one producer thread and THREADS consumer
>   * threads. The producer adds struct work_items to 'todo' and the
> @@ -206,7 +205,8 @@ static void start_threads(struct grep_opt *opt)
>   strbuf_init([i].out, 0);
>   }
>  
> - for (i = 0; i < ARRAY_SIZE(threads); i++) {
> + threads = xcalloc(opt->num_threads, sizeof(pthread_t));
> + for (i = 0; i < opt->num_threads; i++) {
>   int err;
>   struct grep_opt *o = grep_opt_dup(opt);
>   o->output = strbuf_out;
> @@ -220,7 +220,7 @@ static void start_threads(struct grep_opt *opt)
>   }
>  }
>  
> -static int wait_all(void)
> +static int wait_all(struct grep_opt *opt)

I'm not sure passing a grep_opt in here is the cleanest way to do this.
Options are a UI concept and all we care about here is the number of
threads.

Since `threads` is a global, shouldn't the number of threads be a global
as well?  Could we reuse `use_threads` here (possibly renaming it
`num_threads`)?

>  {
>   int hit = 0;
>   int i;
> @@ -238,12 +238,14 @@ static int wait_all(void)
>   pthread_cond_broadcast(_add);
>   grep_unlock();
>  
> - for (i = 0; i < ARRAY_SIZE(threads); i++) {
> + for (i = 0; i < opt->num_threads; i++) {
>   void *h;
>   pthread_join(threads[i], );
>   hit |= (int) (intptr_t) h;
>   }
>  
> + free(threads);
> +
>   pthread_mutex_destroy(_mutex);
>   pthread_mutex_destroy(_read_mutex);
>   pthread_mutex_destroy(_attr_mutex);
> @@ -256,7 +258,7 @@ static int wait_all(void)
>  }
>  #else /* !NO_PTHREADS */
>  
> -static int wait_all(void)
> +static int wait_all(struct grep_opt *opt)
>  {
>   return 0;
>  }
> @@ -702,6 +704,8 @@ int cmd_grep(int argc, const char **argv, const char 
> *prefix)
>   N_("show  context lines before matches")),
>   OPT_INTEGER('A', "after-context", _context,
>   N_("show  context lines after matches")),
> + OPT_INTEGER(0, "threads", _threads,
> + N_("use  worker threads")),
>   OPT_NUMBER_CALLBACK(, N_("shortcut for -C NUM"),
>   context_callback),

RE: [PATCH v3] Add git-grep threads param

2015-10-26 Thread Victor Leschuk
Hello John,

see comments inline.

>> @@ -22,6 +22,7 @@ SYNOPSIS
>>  [--color[=] | --no-color]
>>  [--break] [--heading] [-p | --show-function]
>>  [-A ] [-B ] [-C ]
>> +[--threads ]

> Is this the best place for this option?  I know the current list isn't
> sorted in any particular way, but here you're splitting up the set of
> context options (`-A`, `-B`, `-C` and `-W`).

Agree, I'll move the option both here and in documentation.

>> -static int wait_all(void)
>> +static int wait_all(struct grep_opt *opt)

> I'm not sure passing a grep_opt in here is the cleanest way to do this.
> Options are a UI concept and all we care about here is the number of
> threads.

> Since `threads` is a global, shouldn't the number of threads be a global
> as well?  Could we reuse `use_threads` here (possibly renaming it
> `num_threads`)?

This thought also crossed my mind, however we already pass grep_opt to 
start_threads() function,
so I think passing it to wait_all() is not that ugly, and kind of symmetric. 
And I do not like the idea
of duplicating same information in different places. What do you think?

--
Best Regards,
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html