RE: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Victor Leschuk

Correct.

I think adding the option (both to command line and to config file) is good, as 
long as the IO issues are documented. And default to just the fixed number of 
threads for now - and with the option, maybe people can then more easily try 
out different scenarios and maybe improve on the particular choice of fixed 
number of threads.

> Or maybe the default should be something that isn't even described by any 
> particular fixed number.

> For example, maybe the default that value could be something quite dynamic: 
> start off with a single thread (or a very low thread number) and just set a 
> timer. After 0.1s, if CPU usage is low, start more threads. After another 
> 0.1s, if that improved things, maybe > we could add still more threads...

> Note that "CPU usage is low" can be hard to get portably, but we could 
> approximate it with "how much work did we actually get done". If we only 
> grepped a couple of files, that might be because of IO issues. And if speed 
> does not improve when we move from a 
> single thread to, say, four threads, then we should probably *not* increase 
> the thread number again at 0.2s.

> So I think there are many possible avenues to explore that might be 
> interesting. I do *not* think that "online_cpus()" is one of them, > except 
> perhaps as a very rough measure of "is this a beefy system or not" (but even 
> that is questionable - 32 CPUs is definitely 
> likely "very beefy, so use lots of threads", but even 8 CPUs might still be 
> just a phone, and I'm not sure that tells you a lot, really.
Linus

Here is my version of note for Documentaion:

Number of grep worker threads, use it to tune up performance on
your machines. Leave it unset or set to "0" if you want to use default 
number
(currently default number is 8 for all systems, however this behavior 
can
 be changed in future versions to better suite your hardware and 
circumstances).


--
Victor

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Victor Leschuk

 > if (list.nr || cached)
>num_threads = 1;
 >  if (!num_threads)
> num_threads = GREP_NUM_THREADS_DEFAULT;

>  and then later, instead of use_threads, do:

 >  if (num_threads <= 1) {
... do single-threaded version ...
  > } else {
... do multi-threaded version ...
  > }

 > That matches the logic in builtin/pack-objects.c.

Maybe use the simplest version (and keep num_numbers == 0 also as flag for all 
other checks in code like if(num_flags)  ):

if (list.nr || cached )
  num_threads = 0; // do not use threads
else if (num_threads == 0)
  num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT;
else if (num_threads < 0)
  die(...)

// here we are num_threads > zero, so do nothing


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Jeff King
On Mon, Nov 09, 2015 at 09:55:34AM -0800, Linus Torvalds wrote:

> > if (list.nr || cached )
> >   num_threads = 0; // do not use threads
> > else if (num_threads == 0)
> >   num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT;
> 
> I will say this AGAIN.
> 
> The number of threads is *not* about the number of CPU's. Stop this
> craziness. It's wrong.

I agree with you (and that's why we have a hard-coded default here, and
not online_cpus()).

This check is just whether to turn the threading on or not, based on
whether we have multiple CPUs at all. And I agree that probably should
_not_ be respecting online_cpus() either, because we are better off
parallelizing even on a single CPU to avoid fs latency.

But note that this is already what the code _currently_ does. I do not
mind changing it, but that is a separate notion from Victor's topic,
which is to make the number of threads configurable at all.

What I'd really hope to see is:

  1. Victor adds --threads support, and changes _nothing else_ about the
 defaults.

  2. People experiment with --threads to see what works best in a
 variety of cases (especially things like cold-cache NFS). Then we
 get patches based on real world numbers to adjust:

   2a. GREP_NUM_THREADS_DEFAULT (either bumping the hard-coded
   value, or using some dynamic heuristic)

   2b. What to do for the single-CPU case.

We're doing step 1 here. I don't mind jumping to 2b if we're pretty sure
it's the right thing (and I think it probably is), but it should
definitely be a separate patch.

> So *none* of the threading in git is about CPU's. Maybe we should add
> big honking comments about that.

Minor nit: there definitely _are_ CPU-bound parallel tasks in git.
pack-objects and index-pack come to mind. If we add any user-visible
advice about tweaking thread configuration, we should be sure
that it is scoped to the appropriate flags.

> And that big honking comment should be in the documentation for the
> new flag too. Because it would be really really sad if people say "I
> have a laptop with just two cores, so I'll set the threading option to
> 2", when they then work mostly over a slow wireless network and their
> company wants minimal local installs.

Agreed. It would probably make sense for the "--threads" option
documentation for each command to discuss whether it is meant to
parallelize work across CPUs, or avoid filesystem latency.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Stefan Beller
On Mon, Nov 9, 2015 at 9:55 AM, Linus Torvalds
 wrote:
>
> So stop with the "online_cpus()" stuff. And don't base your benchmarks
> purely on the CPU-bound case. Because the CPU-bound case is the case
> that is already generally so good that few people will care all *that*
> deeply.
>
> Many of the things git does are not for "best-case" behavior, but to
> avoid bad "worst-case" situations. Look at things like the index
> preloading (also threaded). The big win there is - again - when the
> stat() calls may need IO. Sure, it can help for CPU use too, but
> especially on Linux, cached "stat()" calls are really quite cheap. The
> big upside is, again, in situations like git repositories over NFS.
>
> In the CPU-intensive case, the threading might make things go from a
> couple of seconds to half a second. Big deal. You're not getting up to
> get a coffee in either case.

Chiming in here as I have another series in flight doing parallelism.
(Submodules done in parallel including fetching, cloning, checking out)

online_cpus() seems to be one of the easiest ballpark estimates for
the power of a system.

So what I would have liked to use would be some kind of

  parallel_expect_bottleneck(enum kinds);

with kinds being one of (FS, NETWORK, CPU, MEMORY?)
to get an estimate 'good' number to use.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Jeff King
On Mon, Nov 09, 2015 at 08:34:27AM -0800, Victor Leschuk wrote:

> > Why don't we leave it at 8, then? That's the conservative choice, and
> > once we have --threads, people can easily experiment with different
> > values and we can follow-up with a change to the default if need be.
>
> I'd propose the following:
>
> if (list.nr || cached) {
> num_threads = 0; /* Can not multi-thread object lookup */
> }
> else if (num_threads < 0 && online_cpus() <= 1) {
> num_threads = 0; /* User didn't set threading option and we have <= 1 
> of hardware cores */
> }

OK, so you are presumably initializing:

  static int num_threads = -1;

> else if (num_threads == 0) {
> num_threads = GREP_NUM_THREADS_DEFAULT; /* User explicitly choose 
> default behavior */
> }
> else if (num_threads < 0) {  /* Actually this one should be checked 
> earlier so no need to double check here */
> die(_("Ivalid number of threads specified (%d)"), num_threads)
> }

What happens if the user has not specified a value (nr_threads == -1)?
Here you die, but shouldn't you take the default thread value?

I wonder if it would be simpler to just default to 0, and then treat
negative values the same as 0 (which is what pack.threads does). Like:

  if (list.nr || cached)
num_threads = 1;
  if (!num_threads)
num_threads = GREP_NUM_THREADS_DEFAULT;

and then later, instead of use_threads, do:

  if (num_threads <= 1) {
... do single-threaded version ...
  } else {
... do multi-threaded version ...
  }

That matches the logic in builtin/pack-objects.c.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Linus Torvalds
On Mon, Nov 9, 2015 at 10:32 AM, Victor Leschuk
 wrote:
> On Mon, Nov 9, 2015 at 9:28 AM, Victor Leschuk
>>   num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT;
>
> Actually I have never said the nCPUs played main role in it. T

The pseudo-code you sent disagrees. Not that "online_cpus() <= 1" is
likely to ever be really an issue on any development platform from the
last decade.

However, I do have to admit that that "online_cpus()" check goes back
a long time, so I guess I can't really blame you.

At least in the index preloading, I was very conscious of the IO
issues. It doesn't actually make a big difference on traditional disks
(seek times dominate, and concurrent IO often doesn't help at all),
but the reason I keep on bringing up NFS is that back when I used CVS
(oh, the horrors), I *also* worked at a company that did everything
over NFS.  CPU ended up almost never being the limiting factor for any
SCM operation.

So I don't have a very good idea of *what* we should use for automatic
thread detection, but I'm pretty sure online_cpu's should not be it.
Except, like Jeff mentioned, for pack formation (which does tend to be
all about CPU).

Sadly, detecting what kind of filesystem you are on and how well
cached it is, is really pretty hard. Even when you have OS-specific
knowledge, and can look up the *type* of the filesystem, what often
matters more is things like "is the filesystem on a rotational media
or using flash?" etc.

In the meantime I'd argue for just getting rid of the online_cpu's
check, because

 (a) I think it's actively misleading

 (b) the threaded grep probably doesn't hurt much even on a single
CPU, and the _potential_ upside from IO could easily dwarf the cost.

 (c) do developers actually have single-core machines any more?

But if somebody can come up with a smarter model, that would certainly
be good too. The IO advantages really don't tend to be there for
rotational media, but for both flash and network filesystems, threaded
IO can be a huge deal.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Linus Torvalds
On Mon, Nov 9, 2015 at 9:28 AM, Victor Leschuk
 wrote:
>
> Maybe use the simplest version (and keep num_numbers == 0 also as flag for 
> all other checks in code like if(num_flags)  ):
>
> if (list.nr || cached )
>   num_threads = 0; // do not use threads
> else if (num_threads == 0)
>   num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT;

I will say this AGAIN.

The number of threads is *not* about the number of CPU's. Stop this
craziness. It's wrong.

The number of threads is about parallelism. Yes, CPU's is a small part
of it. But as mentioned earlier, the *big* wins are for slow
filesystems, NFS in particular. On NFS, even if you have things
cached, the cache revalidation is going to cause network traffic
almost every time. Being able to have several of those outstanding is
a big deal.

So stop with the "online_cpus()" stuff. And don't base your benchmarks
purely on the CPU-bound case. Because the CPU-bound case is the case
that is already generally so good that few people will care all *that*
deeply.

Many of the things git does are not for "best-case" behavior, but to
avoid bad "worst-case" situations. Look at things like the index
preloading (also threaded). The big win there is - again - when the
stat() calls may need IO. Sure, it can help for CPU use too, but
especially on Linux, cached "stat()" calls are really quite cheap. The
big upside is, again, in situations like git repositories over NFS.

In the CPU-intensive case, the threading might make things go from a
couple of seconds to half a second. Big deal. You're not getting up to
get a coffee in either case.

In the network traffic case, the threading might make things go from
one minute to ten seconds. And *that* is a big deal. That's huge.
That's "annoyingly slow" to "oh, this is the fastest SCM I have ever
worked with in my life".

That can literally be something that changes how you work for a
developer. You start doing things that you simply would never
otherwise do.

So *none* of the threading in git is about CPU's. Maybe we should add
big honking comments about that.

And that big honking comment should be in the documentation for the
new flag too. Because it would be really really sad if people say "I
have a laptop with just two cores, so I'll set the threading option to
2", when they then work mostly over a slow wireless network and their
company wants minimal local installs.

The biggest reason to NOT EVER add that configuration is literally the
confusion about this being about CPU's. That was what got the whole
thread started, that was what the original benchmark numbers were
about, and that was WRONG.

So I would strongly suggest that Junio ignore these patches unless
they very clearly talk about the whole IO issue. Both in the source
code, in the commit messages, and in the documentation for the config
option.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Jeff King
On Mon, Nov 09, 2015 at 09:28:12AM -0800, Victor Leschuk wrote:

> Maybe use the simplest version (and keep num_numbers == 0 also as flag for 
> all other checks in code like if(num_flags)  ):
> 
> if (list.nr || cached )
>   num_threads = 0; // do not use threads
> else if (num_threads == 0)
>   num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT;

That's OK.

> else if (num_threads < 0)
>   die(...)

Do we really want to die here? I think "threads < 0" works the same as
"threads==0" in other git programs. It's also a weird place to die. It
would make:

  git grep --cached --threads=-1

silently work, while:

  git grep --threads=-1

would die.

If we do accept it, it may make sense to normalize it to 0 so that you
can just check "!num_threads" elsewhere in the code.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Victor Leschuk
On Mon, Nov 9, 2015 at 9:28 AM, Victor Leschuk
 wrote:
>
> Maybe use the simplest version (and keep num_numbers == 0 also as flag for 
> all other checks in code like if(num_flags)  ):
>
> if (list.nr || cached )
>   num_threads = 0; // do not use threads
> else if (num_threads == 0)
>   num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT;

> I will say this AGAIN.

> The number of threads is *not* about the number of CPU's. Stop this
craziness. It's wrong.

Actually I have never said the nCPUs played main role in it. The patch is 
intended to provide user ability to change this threads number according to 
their needs and to touch as small amount of other code as possible.

> The number of threads is about parallelism. Yes, CPU's is a small part
> of it. But as mentioned earlier, the *big* wins are for slow
> filesystems, NFS in particular. On NFS, even if you have things
> cached, the cache revalidation is going to cause network traffic
> almost every time. Being able to have several of those outstanding is
> a big deal.

> So stop with the "online_cpus()" stuff. And don't base your benchmarks
> purely on the CPU-bound case. Because the CPU-bound case is the case
> that is already generally so good that few people will care all *that*
> deeply.

I have performed a cold-cached FS test (in previous thread to minimize the CPU 
part in the results) and it showed high correlation between speed and 
thread_num. Isn't it what you said? Even on systems with small number of cores 
we can gain profit of multithreading. That's I why I suggest this param to be 
customizable and not HARDCODED.

We need to create a clear text for the documentation that this number should 
not based on number of CPU-s only. Currently do not mention anything on it.

--
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Jeff King
On Mon, Nov 09, 2015 at 03:36:16AM -0800, Victor Leschuk wrote:

> Yeah do also think it would be more reasonable to use "0" for
> "autodetect" default value. However chat this autodetect value should
> be?
> 
> For index index-pack and pack-objects we use ncpus() for this, however
> according to my tests this wouldn't an Ideal for all cases. Maybe it
> should be something like ncpus()*2, 
> anyway before it we even used hard-coded 8 for all systems...

Why don't we leave it at 8, then? That's the conservative choice, and
once we have --threads, people can easily experiment with different
values and we can follow-up with a change to the default if need be.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Victor Leschuk
> Why don't we leave it at 8, then? That's the conservative choice, and
> once we have --threads, people can easily experiment with different
> values and we can follow-up with a change to the default if need be.

I'd propose the following:

if (list.nr || cached) {

num_threads = 0; /* Can not multi-thread object lookup */   

}   

else if (num_threads < 0 && online_cpus() <= 1) {   

num_threads = 0; /* User didn't set threading option and we have <= 1 
of hardware cores */  
}   

else if (num_threads == 0) {

num_threads = GREP_NUM_THREADS_DEFAULT; /* User explicitly choose 
default behavior */   
}   

else if (num_threads < 0) {  /* Actually this one should be checked earlier 
so no need to double check here */  
 
die(_("Ivalid number of threads specified (%d)"), num_threads)  

} 

--
Victor
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Victor Leschuk
.

> In the meantime I'd argue for just getting rid of the online_cpu's
> check, because

>  (a) I think it's actively misleading

>  (b) the threaded grep probably doesn't hurt much even on a single
> CPU, and the _potential_ upside from IO could easily dwarf the cost.

>  (c) do developers actually have single-core machines any more?

  Linus

So as far as I understood your point at current moment would be better to leave 
online_cpus() check behind,
keep the default threads value (and do so for other threaded programs, in 
separate patches of course).

After that we may focus (in future) on developing smarter heuristics for 
parallelity-relationed params.

Also we should specify in documentation that number of your CPUs may not the 
optimal value and customer should find his own best values based on other 
circumstances.

Correct?

--
Victor--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-09 Thread Victor Leschuk


From: Jeff King [p...@peff.net]
Sent: Tuesday, November 03, 2015 22:40
To: Junio C Hamano
Cc: Victor Leschuk; git@vger.kernel.org; Victor Leschuk; 
torva...@linux-foundation.org; j...@keeping.me.uk
Subject: Re: [PATCH v4] Add git-grep threads param

On Tue, Nov 03, 2015 at 09:22:09AM -0800, Junio C Hamano wrote:

> > +grep.threads::
> > +   Number of grep worker threads, use it to tune up performance on
> > +   multicore machines. Default value is 8. Set to 0 to disable threading.
> > +
>
> I am not enthused by this "Set to 0 to disable".  As Zero is
> magical, it would be more useful if 1 meant that threading is not
> used (i.e. there is only 1 worker), and 0 meant that we would
> automatically pick some reasonable parallelism for you (and we
> promise that the our choice would not be outrageously wrong), or
> something like that.

> Not just useful, but consistent with other parts of git, like
> pack.threads, where "0" already means "autodetect".

Hello Peff and Junio,

Yeah do also think it would be more reasonable to use "0" for "autodetect" 
default value. However chat this autodetect value should be?

For index index-pack and pack-objects we use ncpus() for this, however 
according to my tests this wouldn't an Ideal for all cases. Maybe it should be 
something like ncpus()*2, 
anyway before it we even used hard-coded 8 for all systems...

In this case we use "1" as "Do not use threads" and die on all negative numbers 
during parsing.

Agreed?
--
Victor--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-03 Thread Victor Leschuk
>> do we have any objections on this patch?

> The question you should be asking is "do we have any support".

Hello all, CCing participated reviewers. As Junio  has correctly mentioned: "Do 
we have any support for including this functionality?"

I think this kind of customization can be useful in tuning up performance as 
confirmed by conducted tests. And in most cases having anything hard-coded is 
worse than giving users opportunity to change it, isn't it?
--
Best Regards,
Victor--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Add git-grep threads param

2015-11-03 Thread Junio C Hamano
Victor Leschuk  writes:

> Make number of git-grep worker threads a configuration parameter.
> According to several tests on systems with different number of CPU cores
> the hard-coded number of 8 threads is not optimal for all systems:
> tuning this parameter can significantly speed up grep performance.
>
> Signed-off-by: Victor Leschuk 
> ---
>  Documentation/config.txt   |  4 +++
>  Documentation/git-grep.txt |  9 ++
>  builtin/grep.c | 56 
> --
>  contrib/completion/git-completion.bash |  1 +
>  4 files changed, 54 insertions(+), 16 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 391a0c3..1dd2a61 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1447,6 +1447,10 @@ grep.extendedRegexp::
>   option is ignored when the 'grep.patternType' option is set to a value
>   other than 'default'.
>  
> +grep.threads::
> + Number of grep worker threads, use it to tune up performance on
> + multicore machines. Default value is 8. Set to 0 to disable threading.
> +

I am not enthused by this "Set to 0 to disable".  As Zero is
magical, it would be more useful if 1 meant that threading is not
used (i.e. there is only 1 worker), and 0 meant that we would
automatically pick some reasonable parallelism for you (and we
promise that the our choice would not be outrageously wrong), or
something like that.

Of course, we can do grep.threads=auto to make it even more
explicit, but I'd imagine that the in-core code to parse the option
and config to the num_threads variable would need one "int" value to
represent that "auto", and 0 would be a natural choice for that.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Add git-grep threads param

2015-11-03 Thread Jeff King
On Tue, Nov 03, 2015 at 09:22:09AM -0800, Junio C Hamano wrote:

> > +grep.threads::
> > +   Number of grep worker threads, use it to tune up performance on
> > +   multicore machines. Default value is 8. Set to 0 to disable threading.
> > +
> 
> I am not enthused by this "Set to 0 to disable".  As Zero is
> magical, it would be more useful if 1 meant that threading is not
> used (i.e. there is only 1 worker), and 0 meant that we would
> automatically pick some reasonable parallelism for you (and we
> promise that the our choice would not be outrageously wrong), or
> something like that.

Not just useful, but consistent with other parts of git, like
pack.threads, where "0" already means "autodetect".

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Add git-grep threads param

2015-11-02 Thread Junio C Hamano
Victor Leschuk  writes:

> do we have any objections on this patch?

The question you should be asking is "do we have any support".

It is not like the default for any series is to be included; it is
quite the opposite.  "Is this worth having in our tree?" is the
question we all ask ourselves.

Also I think you should have CC'ed those who gave review comments on
this topic in the earlier rounds, if you wanted to receive answers
to that question (whichever one it is ;-).

Having said that, I didn't immediately spot anything objectionable.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] Add git-grep threads param

2015-11-01 Thread Victor Leschuk
Hello all,

do we have any objections on this patch?

--
Best Regards,
Victor

From: Victor Leschuk [vlesc...@gmail.com]
Sent: Tuesday, October 27, 2015 14:22
To: git@vger.kernel.org
Cc: Victor Leschuk
Subject: [PATCH v4] Add git-grep threads param

Make number of git-grep worker threads a configuration parameter.
According to several tests on systems with different number of CPU cores
the hard-coded number of 8 threads is not optimal for all systems:
tuning this parameter can significantly speed up grep performance.

Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com>
---
 Documentation/config.txt   |  4 +++
 Documentation/git-grep.txt |  9 ++
 builtin/grep.c | 56 --
 contrib/completion/git-completion.bash |  1 +
 4 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..1dd2a61 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1447,6 +1447,10 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.

+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   multicore machines. Default value is 8. Set to 0 to disable threading.
+
 gpg.program::
Use this custom program instead of "gpg" found on $PATH when
making or verifying a PGP signature. The program must support the
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a44d6d..e766596 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
   [--break] [--heading] [-p | --show-function]
   [-A ] [-B ] [-C ]
   [-W | --function-context]
+  [--threads ]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | 
...]
@@ -53,6 +54,10 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.

+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   multicore machines. Default value is 8. Set to 0 to disable threading.
+
 grep.fullName::
If set to true, enable '--full-name' option by default.

@@ -227,6 +232,10 @@ OPTIONS
effectively showing the whole function in which the match was
found.

+--threads ::
+   Set number of worker threads to . Default is 8.
+   Set to 0 to disable threading.
+
 -f ::
Read patterns from , one per line.

diff --git a/builtin/grep.c b/builtin/grep.c
index d04f440..694553e 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -24,11 +24,11 @@ static char const * const grep_usage[] = {
NULL
 };

-static int use_threads = 1;
+#define GREP_NUM_THREADS_DEFAULT 8
+static int num_threads = -1;

 #ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;

 /* We use one producer thread and THREADS consumer
  * threads. The producer adds struct work_items to 'todo' and the
@@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex;

 static inline void grep_lock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_lock(_mutex);
 }

 static inline void grep_unlock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_unlock(_mutex);
 }

@@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt)
strbuf_init([i].out, 0);
}

-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   threads = xcalloc(num_threads, sizeof(pthread_t));
+   for (i = 0; i < num_threads; i++) {
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
@@ -238,12 +239,14 @@ static int wait_all(void)
pthread_cond_broadcast(_add);
grep_unlock();

-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   for (i = 0; i < num_threads; i++) {
void *h;
pthread_join(threads[i], );
hit |= (int) (intptr_t) h;
}

+   free(threads);
+
pthread_mutex_destroy(_mutex);
pthread_mutex_destroy(_read_mutex);
pthread_mutex_destroy(_attr_mutex);
@@ -262,10 +265,22 @@ static int wait_all(void)
 }
 #endif

+static int grep_threads_config(const char *var, const char *value, void *cb)
+{
+   if (!strcmp(var, "grep.threads")) {
+   num_threads = git_config_int(var, value);
+   if (num_threads < 0)
+   die("Invalid number of threads specified (%d)", 
num_threads);
+   }
+   return 0;
+}
+
 static int grep_cmd_config(const char *var, const char *value, vo

[PATCH v4] Add git-grep threads param

2015-10-27 Thread Victor Leschuk
Make number of git-grep worker threads a configuration parameter.
According to several tests on systems with different number of CPU cores
the hard-coded number of 8 threads is not optimal for all systems:
tuning this parameter can significantly speed up grep performance.

Signed-off-by: Victor Leschuk 
---
 Documentation/config.txt   |  4 +++
 Documentation/git-grep.txt |  9 ++
 builtin/grep.c | 56 --
 contrib/completion/git-completion.bash |  1 +
 4 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..1dd2a61 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1447,6 +1447,10 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   multicore machines. Default value is 8. Set to 0 to disable threading.
+
 gpg.program::
Use this custom program instead of "gpg" found on $PATH when
making or verifying a PGP signature. The program must support the
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4a44d6d..e766596 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -23,6 +23,7 @@ SYNOPSIS
   [--break] [--heading] [-p | --show-function]
   [-A ] [-B ] [-C ]
   [-W | --function-context]
+  [--threads ]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | 
...]
@@ -53,6 +54,10 @@ grep.extendedRegexp::
option is ignored when the 'grep.patternType' option is set to a value
other than 'default'.
 
+grep.threads::
+   Number of grep worker threads, use it to tune up performance on
+   multicore machines. Default value is 8. Set to 0 to disable threading.
+
 grep.fullName::
If set to true, enable '--full-name' option by default.
 
@@ -227,6 +232,10 @@ OPTIONS
effectively showing the whole function in which the match was
found.
 
+--threads ::
+   Set number of worker threads to . Default is 8.
+   Set to 0 to disable threading.
+
 -f ::
Read patterns from , one per line.
 
diff --git a/builtin/grep.c b/builtin/grep.c
index d04f440..694553e 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -24,11 +24,11 @@ static char const * const grep_usage[] = {
NULL
 };
 
-static int use_threads = 1;
+#define GREP_NUM_THREADS_DEFAULT 8
+static int num_threads = -1;
 
 #ifndef NO_PTHREADS
-#define THREADS 8
-static pthread_t threads[THREADS];
+static pthread_t *threads;
 
 /* We use one producer thread and THREADS consumer
  * threads. The producer adds struct work_items to 'todo' and the
@@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex;
 
 static inline void grep_lock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_lock(_mutex);
 }
 
 static inline void grep_unlock(void)
 {
-   if (use_threads)
+   if (num_threads)
pthread_mutex_unlock(_mutex);
 }
 
@@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt)
strbuf_init([i].out, 0);
}
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   threads = xcalloc(num_threads, sizeof(pthread_t));
+   for (i = 0; i < num_threads; i++) {
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
@@ -238,12 +239,14 @@ static int wait_all(void)
pthread_cond_broadcast(_add);
grep_unlock();
 
-   for (i = 0; i < ARRAY_SIZE(threads); i++) {
+   for (i = 0; i < num_threads; i++) {
void *h;
pthread_join(threads[i], );
hit |= (int) (intptr_t) h;
}
 
+   free(threads);
+
pthread_mutex_destroy(_mutex);
pthread_mutex_destroy(_read_mutex);
pthread_mutex_destroy(_attr_mutex);
@@ -262,10 +265,22 @@ static int wait_all(void)
 }
 #endif
 
+static int grep_threads_config(const char *var, const char *value, void *cb)
+{
+   if (!strcmp(var, "grep.threads")) {
+   num_threads = git_config_int(var, value);
+   if (num_threads < 0)
+   die("Invalid number of threads specified (%d)", 
num_threads);
+   }
+   return 0;
+}
+
 static int grep_cmd_config(const char *var, const char *value, void *cb)
 {
int st = grep_config(var, value, cb);
-   if (git_color_default_config(var, value, cb) < 0)
+   if (grep_threads_config(var, value, cb) < 0)
+   st = -1;
+   else if (git_color_default_config(var, value, cb) < 0)
st = -1;
return st;
 }
@@ -294,7 +309,7 @@ static int grep_sha1(struct grep_opt *opt,