Re: [PATCH] update-index/diff-index: use core.preloadindex to improve performance
Jeff King p...@peff.net wrote on 02.11.2012 16:26:16: On Tue, Oct 30, 2012 at 10:50:42AM +0100, karsten.bl...@dcon.de wrote: 'update-index --refresh' and 'diff-index' (without --cached) don't honor the core.preloadindex setting yet. Porcelain commands using these (such as git [svn] rebase) suffer from this, especially on Windows. Use read_cache_preload to improve performance. Additionally, in builtin/diff.c, don't preload index status if we don't access the working copy (--cached). Results with msysgit on WebKit repo (2GB in 200k files): | update-index | diff-index | rebase +--++- msysgit-v1.8.0 | 9.157s |10.536s | 42.791s + preloadindex | 9.157s |10.536s | 28.725s + this patch| 2.329s | 2.752s | 15.152s + fscache [1] | 0.731s | 1.171s | 8.877s Cool numbers. On my quad-core SSD Linux box, I saw a few speedups, too. Here are the numbers for update-index --refresh on the WebKit repo (all are wall clock time, best-of-five): | before | after ---++ cold cache | 4.513s | 2.059s warm cache | 0.252s | 0.164s Great, and thanks for testing on Linux (I only have Linux VMs for testing, and I couldn't get meaningful performance data from those). Not as dramatic, but still nice. I wonder how a spinning disk would fare on the cold-cache case, though. I also tried it with all but one CPU disabled, and the warm cache case was a little bit slower. Unfortunately, with a 'real' disk, cold cache times increase with the number of threads. I've played around with '#define MAX_PARALLEL 20' in preload-index.c, with the following results ('git update-index --refresh' on WebKit repo, Win7 x64, Core i7 860 (2.8GHz 4 Core HT), WD VelociRaptor (300G 10krpm 8ms), msysgit 1.8.0 + preload-index patch + fscache patch): MAX_PARALLEL | cold cache | warm cache -++ no preload | 49.938 | 9.204 (*) 1 | 45.412 | 1.622 2 | 55.334 | 1.123 3 | 65.973 | 0.982 4 | 67.579 | 0.889 5 | 76.159 | 0.827 6 | 81.510 | 0.811 7 | 86.269 | 0.858 8 | 85.862 | 0.827 ... 10 | 87.953 | 0.717 ... 20 |176.251 | 0.749 (*) core.preloadindex currently also disables fscache, thus the 9s With more threads, Windows resource monitor also shows increasing disk queue length and response times. It seems clear that more concurrent disk seeks hurt cold cache performance pretty badly. On the other hand, warm cache improvements for #threads #cores are only about 10 - 20%. I don't know if Linux is better at caching / prefetching directory listings (might depend on file system, too), but perhaps MAX_PARALLEL should be set to a more reasonable value, or be made configurable (e.g. core.preloadIndexThreads)? Karsten -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] update-index/diff-index: use core.preloadindex to improve performance
Jeff King p...@peff.net wrote on 02.11.2012 16:38:00: On Fri, Nov 02, 2012 at 11:26:16AM -0400, Jeff King wrote: Still, I don't think we need to worry about performance regressions, because people who don't have a setup suitable for it will not turn on core.preloadindex in the first place. And if they have it on, the more places we use it, probably the better. BTW, your patch was badly damaged in transit (wrapped, and tabs converted to spaces). I was able to fix it up, but please check your mailer's settings. Yes, I feared as much, that's why I included the pull URL (the company MTA only accepts outbound mail from Notes clients, sorry). Is there a policy for people with broken mailers (send patch as attachment, add pull URL more prominently, don't include plaintext patch at all...)? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] update-index/diff-index: use core.preloadindex to improve performance
karsten.bl...@dcon.de writes: Jeff King p...@peff.net wrote on 02.11.2012 16:38:00: On Fri, Nov 02, 2012 at 11:26:16AM -0400, Jeff King wrote: Still, I don't think we need to worry about performance regressions, because people who don't have a setup suitable for it will not turn on core.preloadindex in the first place. And if they have it on, the more places we use it, probably the better. BTW, your patch was badly damaged in transit (wrapped, and tabs converted to spaces). I was able to fix it up, but please check your mailer's settings. Yes, I feared as much, that's why I included the pull URL (the company MTA only accepts outbound mail from Notes clients, sorry). Is there a policy for people with broken mailers (send patch as attachment, add pull URL more prominently, don't include plaintext patch at all...)? If anything, fix your mailer probably is the policy you are looking for, I think. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] update-index/diff-index: use core.preloadindex to improve performance
Am 13.11.2012 17:46, schrieb Junio C Hamano: karsten.bl...@dcon.de writes: If anything, fix your mailer probably is the policy you are looking for, I think. Well then...I've cloned myself @gmail, I hope this is better. Just some provoking thoughts...(if I may): RFC-5322 recommends wrapping lines at 78, and mail relays and gateways are allowed to change message content according to the capabilities of the receiver (RFC-5598). In essence, plaintext mail is completely unsuitable for preformatted text such as source code. On the other hand, git tries to solve the very problem of distributed source code management, and consistency by strong sha-1 checksums is on the top of the feature list. It is somehow hard to believe that contributing to git itself should only be possible using the most unreliable of protocols. Don't you trust your own software? -- 8 -- Subject: [PATCH] update-index/diff-index: use core.preloadindex to improve performance 'update-index --refresh' and 'diff-index' (without --cached) don't honor the core.preloadindex setting yet. Porcelain commands using these (such as git [svn] rebase) suffer from this, especially on Windows. Use read_cache_preload to improve performance. Additionally, in builtin/diff.c, don't preload index status if we don't access the working copy (--cached). Results with msysgit on WebKit repo (2GB in 200k files): | update-index | diff-index | rebase +--++- msysgit-v1.8.0 | 9.157s |10.536s | 42.791s + preloadindex | 9.157s |10.536s | 28.725s + this patch| 2.329s | 2.752s | 15.152s + fscache [1] | 0.731s | 1.171s | 8.877s [1] https://github.com/kblees/git/tree/kb/fscache-v3 Thanks-to: Albert Krawczyk pro-lo...@optusnet.com.au Signed-off-by: Karsten Blees bl...@dcon.de --- Can also be pulled from: https://github.com/kblees/git/tree/kb/update-diff-index-preload-upstream More performance figures (for msysgit) can be found in this discussion: https://github.com/pro-logic/git/commit/32c03dd8 builtin/diff-index.c | 8 ++-- builtin/diff.c | 12 builtin/update-index.c | 1 + 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/builtin/diff-index.c b/builtin/diff-index.c index 2eb32bd..1c737f7 100644 --- a/builtin/diff-index.c +++ b/builtin/diff-index.c @@ -41,9 +41,13 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix) if (rev.pending.nr != 1 || rev.max_count != -1 || rev.min_age != -1 || rev.max_age != -1) usage(diff_cache_usage); - if (!cached) + if (!cached) { setup_work_tree(); - if (read_cache() 0) { + if (read_cache_preload(rev.diffopt.pathspec.raw) 0) { + perror(read_cache_preload); + return -1; + } + } else if (read_cache() 0) { perror(read_cache); return -1; } diff --git a/builtin/diff.c b/builtin/diff.c index 9650be2..198b921 100644 --- a/builtin/diff.c +++ b/builtin/diff.c @@ -130,8 +130,6 @@ static int builtin_diff_index(struct rev_info *revs, usage(builtin_diff_usage); argv++; argc--; } - if (!cached) - setup_work_tree(); /* * Make sure there is one revision (i.e. pending object), * and there is no revision filtering parameters. @@ -140,8 +138,14 @@ static int builtin_diff_index(struct rev_info *revs, revs-max_count != -1 || revs-min_age != -1 || revs-max_age != -1) usage(builtin_diff_usage); - if (read_cache_preload(revs-diffopt.pathspec.raw) 0) { - perror(read_cache_preload); + if (!cached) { + setup_work_tree(); + if (read_cache_preload(revs-diffopt.pathspec.raw) 0) { + perror(read_cache_preload); + return -1; + } + } else if (read_cache() 0) { + perror(read_cache); return -1; } return run_diff_index(revs, cached); diff --git a/builtin/update-index.c b/builtin/update-index.c index 74986bf..ada1dff 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -593,6 +593,7 @@ struct refresh_params { static int refresh(struct refresh_params *o, unsigned int flag) { setup_work_tree(); + read_cache_preload(NULL); *o-has_errors |= refresh_cache(o-flags | flag); return 0; } -- 1.8.0.msysgit.0.3.g7d9d98c -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] update-index/diff-index: use core.preloadindex to improve performance
On Tue, Oct 30, 2012 at 10:50:42AM +0100, karsten.bl...@dcon.de wrote: 'update-index --refresh' and 'diff-index' (without --cached) don't honor the core.preloadindex setting yet. Porcelain commands using these (such as git [svn] rebase) suffer from this, especially on Windows. Use read_cache_preload to improve performance. Additionally, in builtin/diff.c, don't preload index status if we don't access the working copy (--cached). Results with msysgit on WebKit repo (2GB in 200k files): | update-index | diff-index | rebase +--++- msysgit-v1.8.0 | 9.157s |10.536s | 42.791s + preloadindex | 9.157s |10.536s | 28.725s + this patch| 2.329s | 2.752s | 15.152s + fscache [1] | 0.731s | 1.171s | 8.877s Cool numbers. On my quad-core SSD Linux box, I saw a few speedups, too. Here are the numbers for update-index --refresh on the WebKit repo (all are wall clock time, best-of-five): | before | after ---++ cold cache | 4.513s | 2.059s warm cache | 0.252s | 0.164s Not as dramatic, but still nice. I wonder how a spinning disk would fare on the cold-cache case, though. I also tried it with all but one CPU disabled, and the warm cache case was a little bit slower. Still, I don't think we need to worry about performance regressions, because people who don't have a setup suitable for it will not turn on core.preloadindex in the first place. And if they have it on, the more places we use it, probably the better. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] update-index/diff-index: use core.preloadindex to improve performance
On Fri, Nov 02, 2012 at 11:26:16AM -0400, Jeff King wrote: Still, I don't think we need to worry about performance regressions, because people who don't have a setup suitable for it will not turn on core.preloadindex in the first place. And if they have it on, the more places we use it, probably the better. BTW, your patch was badly damaged in transit (wrapped, and tabs converted to spaces). I was able to fix it up, but please check your mailer's settings. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] update-index/diff-index: use core.preloadindex to improve performance
'update-index --refresh' and 'diff-index' (without --cached) don't honor the core.preloadindex setting yet. Porcelain commands using these (such as git [svn] rebase) suffer from this, especially on Windows. Use read_cache_preload to improve performance. Additionally, in builtin/diff.c, don't preload index status if we don't access the working copy (--cached). Results with msysgit on WebKit repo (2GB in 200k files): | update-index | diff-index | rebase +--++- msysgit-v1.8.0 | 9.157s |10.536s | 42.791s + preloadindex | 9.157s |10.536s | 28.725s + this patch| 2.329s | 2.752s | 15.152s + fscache [1] | 0.731s | 1.171s | 8.877s [1] https://github.com/kblees/git/tree/kb/fscache-v3 Thanks-to: Albert Krawczyk pro-lo...@optusnet.com.au Signed-off-by: Karsten Blees bl...@dcon.de --- Can also be pulled from: https://github.com/kblees/git/tree/kb/update-diff-index-preload-upstream I thought I might send this upstream directly, as its not msysgit related. More performance figures (for msysgit) can be found in this discussion: https://github.com/pro-logic/git/commit/32c03dd8 Ciao, Karsten builtin/diff-index.c | 8 ++-- builtin/diff.c | 12 builtin/update-index.c | 1 + 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/builtin/diff-index.c b/builtin/diff-index.c index 2eb32bd..1c737f7 100644 --- a/builtin/diff-index.c +++ b/builtin/diff-index.c @@ -41,9 +41,13 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix) if (rev.pending.nr != 1 || rev.max_count != -1 || rev.min_age != -1 || rev.max_age != -1) usage(diff_cache_usage); - if (!cached) + if (!cached) { setup_work_tree(); - if (read_cache() 0) { + if (read_cache_preload(rev.diffopt.pathspec.raw) 0) { + perror(read_cache_preload); + return -1; + } + } else if (read_cache() 0) { perror(read_cache); return -1; } diff --git a/builtin/diff.c b/builtin/diff.c index 9650be2..198b921 100644 --- a/builtin/diff.c +++ b/builtin/diff.c @@ -130,8 +130,6 @@ static int builtin_diff_index(struct rev_info *revs, usage(builtin_diff_usage); argv++; argc--; } - if (!cached) - setup_work_tree(); /* * Make sure there is one revision (i.e. pending object), * and there is no revision filtering parameters. @@ -140,8 +138,14 @@ static int builtin_diff_index(struct rev_info *revs, revs-max_count != -1 || revs-min_age != -1 || revs-max_age != -1) usage(builtin_diff_usage); - if (read_cache_preload(revs-diffopt.pathspec.raw) 0) { - perror(read_cache_preload); + if (!cached) { + setup_work_tree(); + if (read_cache_preload(revs-diffopt.pathspec.raw) 0) { + perror(read_cache_preload); + return -1; + } + } else if (read_cache() 0) { + perror(read_cache); return -1; } return run_diff_index(revs, cached); diff --git a/builtin/update-index.c b/builtin/update-index.c index 74986bf..ada1dff 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -593,6 +593,7 @@ struct refresh_params { static int refresh(struct refresh_params *o, unsigned int flag) { setup_work_tree(); + read_cache_preload(NULL); *o-has_errors |= refresh_cache(o-flags | flag); return 0; } -- 1.8.0.msysgit.0.3.g7d9d98c -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] update-index/diff-index: use core.preloadindex to improve performance
On Tue, Oct 30, 2012 at 10:50 AM, karsten.bl...@dcon.de wrote: 'update-index --refresh' and 'diff-index' (without --cached) don't honor the core.preloadindex setting yet. Porcelain commands using these (such as git [svn] rebase) suffer from this, especially on Windows. Use read_cache_preload to improve performance. Additionally, in builtin/diff.c, don't preload index status if we don't access the working copy (--cached). Results with msysgit on WebKit repo (2GB in 200k files): | update-index | diff-index | rebase +--++- msysgit-v1.8.0 | 9.157s |10.536s | 42.791s + preloadindex | 9.157s |10.536s | 28.725s + this patch| 2.329s | 2.752s | 15.152s + fscache [1] | 0.731s | 1.171s | 8.877s Wow, awesome results :) This also makes me want to play around with the fscache stuff a bit; about an order of magnitude improvement is quite noticeable :) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html