RE: [PATCH v4] Add git-grep threads param
Correct. I think adding the option (both to command line and to config file) is good, as long as the IO issues are documented. And default to just the fixed number of threads for now - and with the option, maybe people can then more easily try out different scenarios and maybe improve on the particular choice of fixed number of threads. > Or maybe the default should be something that isn't even described by any > particular fixed number. > For example, maybe the default that value could be something quite dynamic: > start off with a single thread (or a very low thread number) and just set a > timer. After 0.1s, if CPU usage is low, start more threads. After another > 0.1s, if that improved things, maybe > we could add still more threads... > Note that "CPU usage is low" can be hard to get portably, but we could > approximate it with "how much work did we actually get done". If we only > grepped a couple of files, that might be because of IO issues. And if speed > does not improve when we move from a > single thread to, say, four threads, then we should probably *not* increase > the thread number again at 0.2s. > So I think there are many possible avenues to explore that might be > interesting. I do *not* think that "online_cpus()" is one of them, > except > perhaps as a very rough measure of "is this a beefy system or not" (but even > that is questionable - 32 CPUs is definitely > likely "very beefy, so use lots of threads", but even 8 CPUs might still be > just a phone, and I'm not sure that tells you a lot, really. Linus Here is my version of note for Documentaion: Number of grep worker threads, use it to tune up performance on your machines. Leave it unset or set to "0" if you want to use default number (currently default number is 8 for all systems, however this behavior can be changed in future versions to better suite your hardware and circumstances). -- Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
> if (list.nr || cached) >num_threads = 1; > if (!num_threads) > num_threads = GREP_NUM_THREADS_DEFAULT; > and then later, instead of use_threads, do: > if (num_threads <= 1) { ... do single-threaded version ... > } else { ... do multi-threaded version ... > } > That matches the logic in builtin/pack-objects.c. Maybe use the simplest version (and keep num_numbers == 0 also as flag for all other checks in code like if(num_flags) ): if (list.nr || cached ) num_threads = 0; // do not use threads else if (num_threads == 0) num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT; else if (num_threads < 0) die(...) // here we are num_threads > zero, so do nothing -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Add git-grep threads param
On Mon, Nov 09, 2015 at 09:55:34AM -0800, Linus Torvalds wrote: > > if (list.nr || cached ) > > num_threads = 0; // do not use threads > > else if (num_threads == 0) > > num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT; > > I will say this AGAIN. > > The number of threads is *not* about the number of CPU's. Stop this > craziness. It's wrong. I agree with you (and that's why we have a hard-coded default here, and not online_cpus()). This check is just whether to turn the threading on or not, based on whether we have multiple CPUs at all. And I agree that probably should _not_ be respecting online_cpus() either, because we are better off parallelizing even on a single CPU to avoid fs latency. But note that this is already what the code _currently_ does. I do not mind changing it, but that is a separate notion from Victor's topic, which is to make the number of threads configurable at all. What I'd really hope to see is: 1. Victor adds --threads support, and changes _nothing else_ about the defaults. 2. People experiment with --threads to see what works best in a variety of cases (especially things like cold-cache NFS). Then we get patches based on real world numbers to adjust: 2a. GREP_NUM_THREADS_DEFAULT (either bumping the hard-coded value, or using some dynamic heuristic) 2b. What to do for the single-CPU case. We're doing step 1 here. I don't mind jumping to 2b if we're pretty sure it's the right thing (and I think it probably is), but it should definitely be a separate patch. > So *none* of the threading in git is about CPU's. Maybe we should add > big honking comments about that. Minor nit: there definitely _are_ CPU-bound parallel tasks in git. pack-objects and index-pack come to mind. If we add any user-visible advice about tweaking thread configuration, we should be sure that it is scoped to the appropriate flags. > And that big honking comment should be in the documentation for the > new flag too. Because it would be really really sad if people say "I > have a laptop with just two cores, so I'll set the threading option to > 2", when they then work mostly over a slow wireless network and their > company wants minimal local installs. Agreed. It would probably make sense for the "--threads" option documentation for each command to discuss whether it is meant to parallelize work across CPUs, or avoid filesystem latency. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Add git-grep threads param
On Mon, Nov 9, 2015 at 9:55 AM, Linus Torvaldswrote: > > So stop with the "online_cpus()" stuff. And don't base your benchmarks > purely on the CPU-bound case. Because the CPU-bound case is the case > that is already generally so good that few people will care all *that* > deeply. > > Many of the things git does are not for "best-case" behavior, but to > avoid bad "worst-case" situations. Look at things like the index > preloading (also threaded). The big win there is - again - when the > stat() calls may need IO. Sure, it can help for CPU use too, but > especially on Linux, cached "stat()" calls are really quite cheap. The > big upside is, again, in situations like git repositories over NFS. > > In the CPU-intensive case, the threading might make things go from a > couple of seconds to half a second. Big deal. You're not getting up to > get a coffee in either case. Chiming in here as I have another series in flight doing parallelism. (Submodules done in parallel including fetching, cloning, checking out) online_cpus() seems to be one of the easiest ballpark estimates for the power of a system. So what I would have liked to use would be some kind of parallel_expect_bottleneck(enum kinds); with kinds being one of (FS, NETWORK, CPU, MEMORY?) to get an estimate 'good' number to use. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Add git-grep threads param
On Mon, Nov 09, 2015 at 08:34:27AM -0800, Victor Leschuk wrote: > > Why don't we leave it at 8, then? That's the conservative choice, and > > once we have --threads, people can easily experiment with different > > values and we can follow-up with a change to the default if need be. > > I'd propose the following: > > if (list.nr || cached) { > num_threads = 0; /* Can not multi-thread object lookup */ > } > else if (num_threads < 0 && online_cpus() <= 1) { > num_threads = 0; /* User didn't set threading option and we have <= 1 > of hardware cores */ > } OK, so you are presumably initializing: static int num_threads = -1; > else if (num_threads == 0) { > num_threads = GREP_NUM_THREADS_DEFAULT; /* User explicitly choose > default behavior */ > } > else if (num_threads < 0) { /* Actually this one should be checked > earlier so no need to double check here */ > die(_("Ivalid number of threads specified (%d)"), num_threads) > } What happens if the user has not specified a value (nr_threads == -1)? Here you die, but shouldn't you take the default thread value? I wonder if it would be simpler to just default to 0, and then treat negative values the same as 0 (which is what pack.threads does). Like: if (list.nr || cached) num_threads = 1; if (!num_threads) num_threads = GREP_NUM_THREADS_DEFAULT; and then later, instead of use_threads, do: if (num_threads <= 1) { ... do single-threaded version ... } else { ... do multi-threaded version ... } That matches the logic in builtin/pack-objects.c. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Add git-grep threads param
On Mon, Nov 9, 2015 at 10:32 AM, Victor Leschukwrote: > On Mon, Nov 9, 2015 at 9:28 AM, Victor Leschuk >> num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT; > > Actually I have never said the nCPUs played main role in it. T The pseudo-code you sent disagrees. Not that "online_cpus() <= 1" is likely to ever be really an issue on any development platform from the last decade. However, I do have to admit that that "online_cpus()" check goes back a long time, so I guess I can't really blame you. At least in the index preloading, I was very conscious of the IO issues. It doesn't actually make a big difference on traditional disks (seek times dominate, and concurrent IO often doesn't help at all), but the reason I keep on bringing up NFS is that back when I used CVS (oh, the horrors), I *also* worked at a company that did everything over NFS. CPU ended up almost never being the limiting factor for any SCM operation. So I don't have a very good idea of *what* we should use for automatic thread detection, but I'm pretty sure online_cpu's should not be it. Except, like Jeff mentioned, for pack formation (which does tend to be all about CPU). Sadly, detecting what kind of filesystem you are on and how well cached it is, is really pretty hard. Even when you have OS-specific knowledge, and can look up the *type* of the filesystem, what often matters more is things like "is the filesystem on a rotational media or using flash?" etc. In the meantime I'd argue for just getting rid of the online_cpu's check, because (a) I think it's actively misleading (b) the threaded grep probably doesn't hurt much even on a single CPU, and the _potential_ upside from IO could easily dwarf the cost. (c) do developers actually have single-core machines any more? But if somebody can come up with a smarter model, that would certainly be good too. The IO advantages really don't tend to be there for rotational media, but for both flash and network filesystems, threaded IO can be a huge deal. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Add git-grep threads param
On Mon, Nov 9, 2015 at 9:28 AM, Victor Leschukwrote: > > Maybe use the simplest version (and keep num_numbers == 0 also as flag for > all other checks in code like if(num_flags) ): > > if (list.nr || cached ) > num_threads = 0; // do not use threads > else if (num_threads == 0) > num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT; I will say this AGAIN. The number of threads is *not* about the number of CPU's. Stop this craziness. It's wrong. The number of threads is about parallelism. Yes, CPU's is a small part of it. But as mentioned earlier, the *big* wins are for slow filesystems, NFS in particular. On NFS, even if you have things cached, the cache revalidation is going to cause network traffic almost every time. Being able to have several of those outstanding is a big deal. So stop with the "online_cpus()" stuff. And don't base your benchmarks purely on the CPU-bound case. Because the CPU-bound case is the case that is already generally so good that few people will care all *that* deeply. Many of the things git does are not for "best-case" behavior, but to avoid bad "worst-case" situations. Look at things like the index preloading (also threaded). The big win there is - again - when the stat() calls may need IO. Sure, it can help for CPU use too, but especially on Linux, cached "stat()" calls are really quite cheap. The big upside is, again, in situations like git repositories over NFS. In the CPU-intensive case, the threading might make things go from a couple of seconds to half a second. Big deal. You're not getting up to get a coffee in either case. In the network traffic case, the threading might make things go from one minute to ten seconds. And *that* is a big deal. That's huge. That's "annoyingly slow" to "oh, this is the fastest SCM I have ever worked with in my life". That can literally be something that changes how you work for a developer. You start doing things that you simply would never otherwise do. So *none* of the threading in git is about CPU's. Maybe we should add big honking comments about that. And that big honking comment should be in the documentation for the new flag too. Because it would be really really sad if people say "I have a laptop with just two cores, so I'll set the threading option to 2", when they then work mostly over a slow wireless network and their company wants minimal local installs. The biggest reason to NOT EVER add that configuration is literally the confusion about this being about CPU's. That was what got the whole thread started, that was what the original benchmark numbers were about, and that was WRONG. So I would strongly suggest that Junio ignore these patches unless they very clearly talk about the whole IO issue. Both in the source code, in the commit messages, and in the documentation for the config option. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Add git-grep threads param
On Mon, Nov 09, 2015 at 09:28:12AM -0800, Victor Leschuk wrote: > Maybe use the simplest version (and keep num_numbers == 0 also as flag for > all other checks in code like if(num_flags) ): > > if (list.nr || cached ) > num_threads = 0; // do not use threads > else if (num_threads == 0) > num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT; That's OK. > else if (num_threads < 0) > die(...) Do we really want to die here? I think "threads < 0" works the same as "threads==0" in other git programs. It's also a weird place to die. It would make: git grep --cached --threads=-1 silently work, while: git grep --threads=-1 would die. If we do accept it, it may make sense to normalize it to 0 so that you can just check "!num_threads" elsewhere in the code. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
On Mon, Nov 9, 2015 at 9:28 AM, Victor Leschukwrote: > > Maybe use the simplest version (and keep num_numbers == 0 also as flag for > all other checks in code like if(num_flags) ): > > if (list.nr || cached ) > num_threads = 0; // do not use threads > else if (num_threads == 0) > num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT; > I will say this AGAIN. > The number of threads is *not* about the number of CPU's. Stop this craziness. It's wrong. Actually I have never said the nCPUs played main role in it. The patch is intended to provide user ability to change this threads number according to their needs and to touch as small amount of other code as possible. > The number of threads is about parallelism. Yes, CPU's is a small part > of it. But as mentioned earlier, the *big* wins are for slow > filesystems, NFS in particular. On NFS, even if you have things > cached, the cache revalidation is going to cause network traffic > almost every time. Being able to have several of those outstanding is > a big deal. > So stop with the "online_cpus()" stuff. And don't base your benchmarks > purely on the CPU-bound case. Because the CPU-bound case is the case > that is already generally so good that few people will care all *that* > deeply. I have performed a cold-cached FS test (in previous thread to minimize the CPU part in the results) and it showed high correlation between speed and thread_num. Isn't it what you said? Even on systems with small number of cores we can gain profit of multithreading. That's I why I suggest this param to be customizable and not HARDCODED. We need to create a clear text for the documentation that this number should not based on number of CPU-s only. Currently do not mention anything on it. -- Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Add git-grep threads param
On Mon, Nov 09, 2015 at 03:36:16AM -0800, Victor Leschuk wrote: > Yeah do also think it would be more reasonable to use "0" for > "autodetect" default value. However chat this autodetect value should > be? > > For index index-pack and pack-objects we use ncpus() for this, however > according to my tests this wouldn't an Ideal for all cases. Maybe it > should be something like ncpus()*2, > anyway before it we even used hard-coded 8 for all systems... Why don't we leave it at 8, then? That's the conservative choice, and once we have --threads, people can easily experiment with different values and we can follow-up with a change to the default if need be. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
> Why don't we leave it at 8, then? That's the conservative choice, and > once we have --threads, people can easily experiment with different > values and we can follow-up with a change to the default if need be. I'd propose the following: if (list.nr || cached) { num_threads = 0; /* Can not multi-thread object lookup */ } else if (num_threads < 0 && online_cpus() <= 1) { num_threads = 0; /* User didn't set threading option and we have <= 1 of hardware cores */ } else if (num_threads == 0) { num_threads = GREP_NUM_THREADS_DEFAULT; /* User explicitly choose default behavior */ } else if (num_threads < 0) { /* Actually this one should be checked earlier so no need to double check here */ die(_("Ivalid number of threads specified (%d)"), num_threads) } -- Victor -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
. > In the meantime I'd argue for just getting rid of the online_cpu's > check, because > (a) I think it's actively misleading > (b) the threaded grep probably doesn't hurt much even on a single > CPU, and the _potential_ upside from IO could easily dwarf the cost. > (c) do developers actually have single-core machines any more? Linus So as far as I understood your point at current moment would be better to leave online_cpus() check behind, keep the default threads value (and do so for other threaded programs, in separate patches of course). After that we may focus (in future) on developing smarter heuristics for parallelity-relationed params. Also we should specify in documentation that number of your CPUs may not the optimal value and customer should find his own best values based on other circumstances. Correct? -- Victor-- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
From: Jeff King [p...@peff.net] Sent: Tuesday, November 03, 2015 22:40 To: Junio C Hamano Cc: Victor Leschuk; git@vger.kernel.org; Victor Leschuk; torva...@linux-foundation.org; j...@keeping.me.uk Subject: Re: [PATCH v4] Add git-grep threads param On Tue, Nov 03, 2015 at 09:22:09AM -0800, Junio C Hamano wrote: > > +grep.threads:: > > + Number of grep worker threads, use it to tune up performance on > > + multicore machines. Default value is 8. Set to 0 to disable threading. > > + > > I am not enthused by this "Set to 0 to disable". As Zero is > magical, it would be more useful if 1 meant that threading is not > used (i.e. there is only 1 worker), and 0 meant that we would > automatically pick some reasonable parallelism for you (and we > promise that the our choice would not be outrageously wrong), or > something like that. > Not just useful, but consistent with other parts of git, like > pack.threads, where "0" already means "autodetect". Hello Peff and Junio, Yeah do also think it would be more reasonable to use "0" for "autodetect" default value. However chat this autodetect value should be? For index index-pack and pack-objects we use ncpus() for this, however according to my tests this wouldn't an Ideal for all cases. Maybe it should be something like ncpus()*2, anyway before it we even used hard-coded 8 for all systems... In this case we use "1" as "Do not use threads" and die on all negative numbers during parsing. Agreed? -- Victor-- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
>> do we have any objections on this patch? > The question you should be asking is "do we have any support". Hello all, CCing participated reviewers. As Junio has correctly mentioned: "Do we have any support for including this functionality?" I think this kind of customization can be useful in tuning up performance as confirmed by conducted tests. And in most cases having anything hard-coded is worse than giving users opportunity to change it, isn't it? -- Best Regards, Victor-- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Add git-grep threads param
Victor Leschukwrites: > Make number of git-grep worker threads a configuration parameter. > According to several tests on systems with different number of CPU cores > the hard-coded number of 8 threads is not optimal for all systems: > tuning this parameter can significantly speed up grep performance. > > Signed-off-by: Victor Leschuk > --- > Documentation/config.txt | 4 +++ > Documentation/git-grep.txt | 9 ++ > builtin/grep.c | 56 > -- > contrib/completion/git-completion.bash | 1 + > 4 files changed, 54 insertions(+), 16 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 391a0c3..1dd2a61 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -1447,6 +1447,10 @@ grep.extendedRegexp:: > option is ignored when the 'grep.patternType' option is set to a value > other than 'default'. > > +grep.threads:: > + Number of grep worker threads, use it to tune up performance on > + multicore machines. Default value is 8. Set to 0 to disable threading. > + I am not enthused by this "Set to 0 to disable". As Zero is magical, it would be more useful if 1 meant that threading is not used (i.e. there is only 1 worker), and 0 meant that we would automatically pick some reasonable parallelism for you (and we promise that the our choice would not be outrageously wrong), or something like that. Of course, we can do grep.threads=auto to make it even more explicit, but I'd imagine that the in-core code to parse the option and config to the num_threads variable would need one "int" value to represent that "auto", and 0 would be a natural choice for that. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Add git-grep threads param
On Tue, Nov 03, 2015 at 09:22:09AM -0800, Junio C Hamano wrote: > > +grep.threads:: > > + Number of grep worker threads, use it to tune up performance on > > + multicore machines. Default value is 8. Set to 0 to disable threading. > > + > > I am not enthused by this "Set to 0 to disable". As Zero is > magical, it would be more useful if 1 meant that threading is not > used (i.e. there is only 1 worker), and 0 meant that we would > automatically pick some reasonable parallelism for you (and we > promise that the our choice would not be outrageously wrong), or > something like that. Not just useful, but consistent with other parts of git, like pack.threads, where "0" already means "autodetect". -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Add git-grep threads param
Victor Leschukwrites: > do we have any objections on this patch? The question you should be asking is "do we have any support". It is not like the default for any series is to be included; it is quite the opposite. "Is this worth having in our tree?" is the question we all ask ourselves. Also I think you should have CC'ed those who gave review comments on this topic in the earlier rounds, if you wanted to receive answers to that question (whichever one it is ;-). Having said that, I didn't immediately spot anything objectionable. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4] Add git-grep threads param
Hello all, do we have any objections on this patch? -- Best Regards, Victor From: Victor Leschuk [vlesc...@gmail.com] Sent: Tuesday, October 27, 2015 14:22 To: git@vger.kernel.org Cc: Victor Leschuk Subject: [PATCH v4] Add git-grep threads param Make number of git-grep worker threads a configuration parameter. According to several tests on systems with different number of CPU cores the hard-coded number of 8 threads is not optimal for all systems: tuning this parameter can significantly speed up grep performance. Signed-off-by: Victor Leschuk <vlesc...@accesssoftek.com> --- Documentation/config.txt | 4 +++ Documentation/git-grep.txt | 9 ++ builtin/grep.c | 56 -- contrib/completion/git-completion.bash | 1 + 4 files changed, 54 insertions(+), 16 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 391a0c3..1dd2a61 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1447,6 +1447,10 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + multicore machines. Default value is 8. Set to 0 to disable threading. + gpg.program:: Use this custom program instead of "gpg" found on $PATH when making or verifying a PGP signature. The program must support the diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a44d6d..e766596 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -23,6 +23,7 @@ SYNOPSIS [--break] [--heading] [-p | --show-function] [-A ] [-B ] [-C ] [-W | --function-context] + [--threads ] [-f ] [-e] [--and|--or|--not|(|)|-e ...] [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | ...] @@ -53,6 +54,10 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + multicore machines. Default value is 8. Set to 0 to disable threading. + grep.fullName:: If set to true, enable '--full-name' option by default. @@ -227,6 +232,10 @@ OPTIONS effectively showing the whole function in which the match was found. +--threads :: + Set number of worker threads to . Default is 8. + Set to 0 to disable threading. + -f :: Read patterns from , one per line. diff --git a/builtin/grep.c b/builtin/grep.c index d04f440..694553e 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -24,11 +24,11 @@ static char const * const grep_usage[] = { NULL }; -static int use_threads = 1; +#define GREP_NUM_THREADS_DEFAULT 8 +static int num_threads = -1; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread_t threads[THREADS]; +static pthread_t *threads; /* We use one producer thread and THREADS consumer * threads. The producer adds struct work_items to 'todo' and the @@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex; static inline void grep_lock(void) { - if (use_threads) + if (num_threads) pthread_mutex_lock(_mutex); } static inline void grep_unlock(void) { - if (use_threads) + if (num_threads) pthread_mutex_unlock(_mutex); } @@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt) strbuf_init([i].out, 0); } - for (i = 0; i < ARRAY_SIZE(threads); i++) { + threads = xcalloc(num_threads, sizeof(pthread_t)); + for (i = 0; i < num_threads; i++) { int err; struct grep_opt *o = grep_opt_dup(opt); o->output = strbuf_out; @@ -238,12 +239,14 @@ static int wait_all(void) pthread_cond_broadcast(_add); grep_unlock(); - for (i = 0; i < ARRAY_SIZE(threads); i++) { + for (i = 0; i < num_threads; i++) { void *h; pthread_join(threads[i], ); hit |= (int) (intptr_t) h; } + free(threads); + pthread_mutex_destroy(_mutex); pthread_mutex_destroy(_read_mutex); pthread_mutex_destroy(_attr_mutex); @@ -262,10 +265,22 @@ static int wait_all(void) } #endif +static int grep_threads_config(const char *var, const char *value, void *cb) +{ + if (!strcmp(var, "grep.threads")) { + num_threads = git_config_int(var, value); + if (num_threads < 0) + die("Invalid number of threads specified (%d)", num_threads); + } + return 0; +} + static int grep_cmd_config(const char *var, const char *value, vo
[PATCH v4] Add git-grep threads param
Make number of git-grep worker threads a configuration parameter. According to several tests on systems with different number of CPU cores the hard-coded number of 8 threads is not optimal for all systems: tuning this parameter can significantly speed up grep performance. Signed-off-by: Victor Leschuk--- Documentation/config.txt | 4 +++ Documentation/git-grep.txt | 9 ++ builtin/grep.c | 56 -- contrib/completion/git-completion.bash | 1 + 4 files changed, 54 insertions(+), 16 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 391a0c3..1dd2a61 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1447,6 +1447,10 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + multicore machines. Default value is 8. Set to 0 to disable threading. + gpg.program:: Use this custom program instead of "gpg" found on $PATH when making or verifying a PGP signature. The program must support the diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 4a44d6d..e766596 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -23,6 +23,7 @@ SYNOPSIS [--break] [--heading] [-p | --show-function] [-A ] [-B ] [-C ] [-W | --function-context] + [--threads ] [-f ] [-e] [--and|--or|--not|(|)|-e ...] [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | ...] @@ -53,6 +54,10 @@ grep.extendedRegexp:: option is ignored when the 'grep.patternType' option is set to a value other than 'default'. +grep.threads:: + Number of grep worker threads, use it to tune up performance on + multicore machines. Default value is 8. Set to 0 to disable threading. + grep.fullName:: If set to true, enable '--full-name' option by default. @@ -227,6 +232,10 @@ OPTIONS effectively showing the whole function in which the match was found. +--threads :: + Set number of worker threads to . Default is 8. + Set to 0 to disable threading. + -f :: Read patterns from , one per line. diff --git a/builtin/grep.c b/builtin/grep.c index d04f440..694553e 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -24,11 +24,11 @@ static char const * const grep_usage[] = { NULL }; -static int use_threads = 1; +#define GREP_NUM_THREADS_DEFAULT 8 +static int num_threads = -1; #ifndef NO_PTHREADS -#define THREADS 8 -static pthread_t threads[THREADS]; +static pthread_t *threads; /* We use one producer thread and THREADS consumer * threads. The producer adds struct work_items to 'todo' and the @@ -63,13 +63,13 @@ static pthread_mutex_t grep_mutex; static inline void grep_lock(void) { - if (use_threads) + if (num_threads) pthread_mutex_lock(_mutex); } static inline void grep_unlock(void) { - if (use_threads) + if (num_threads) pthread_mutex_unlock(_mutex); } @@ -206,7 +206,8 @@ static void start_threads(struct grep_opt *opt) strbuf_init([i].out, 0); } - for (i = 0; i < ARRAY_SIZE(threads); i++) { + threads = xcalloc(num_threads, sizeof(pthread_t)); + for (i = 0; i < num_threads; i++) { int err; struct grep_opt *o = grep_opt_dup(opt); o->output = strbuf_out; @@ -238,12 +239,14 @@ static int wait_all(void) pthread_cond_broadcast(_add); grep_unlock(); - for (i = 0; i < ARRAY_SIZE(threads); i++) { + for (i = 0; i < num_threads; i++) { void *h; pthread_join(threads[i], ); hit |= (int) (intptr_t) h; } + free(threads); + pthread_mutex_destroy(_mutex); pthread_mutex_destroy(_read_mutex); pthread_mutex_destroy(_attr_mutex); @@ -262,10 +265,22 @@ static int wait_all(void) } #endif +static int grep_threads_config(const char *var, const char *value, void *cb) +{ + if (!strcmp(var, "grep.threads")) { + num_threads = git_config_int(var, value); + if (num_threads < 0) + die("Invalid number of threads specified (%d)", num_threads); + } + return 0; +} + static int grep_cmd_config(const char *var, const char *value, void *cb) { int st = grep_config(var, value, cb); - if (git_color_default_config(var, value, cb) < 0) + if (grep_threads_config(var, value, cb) < 0) + st = -1; + else if (git_color_default_config(var, value, cb) < 0) st = -1; return st; } @@ -294,7 +309,7 @@ static int grep_sha1(struct grep_opt *opt,