Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
OK. Looks like a good performance vs. codesize tradeoff. Yes, but IMO this should be done in the generic code, unrolling small loops is profitable on most architectures. -- Eric Botcazou
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
On Sat, Nov 22, 2014 at 10:49 AM, Eric Botcazou ebotca...@adacore.com wrote: OK. Looks like a good performance vs. codesize tradeoff. Yes, but IMO this should be done in the generic code, unrolling small loops is profitable on most architectures. Yeah, but after a couple of pings for a generic change, we went the target way. Uros.
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
Yeah, but after a couple of pings for a generic change, we went the target way. That's a bit of a shame, the 400 - 100 change was very likely tested only on x86-64 and nevetheless applied to the generic code, so the fix repairing the damages should also be applied to the generic code. -- Eric Botcazou
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
On November 22, 2014 12:24:22 PM CET, Eric Botcazou ebotca...@adacore.com wrote: Yeah, but after a couple of pings for a generic change, we went the target way. That's a bit of a shame, the 400 - 100 change was very likely tested only on x86-64 and nevetheless applied to the generic code, so the fix repairing the damages should also be applied to the generic code. A patch to bump the generic limit is OK. Targets that dont want it can reduce it in target specific code. Richard.
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
On Sat, Nov 22, 2014 at 7:38 PM, Richard Biener richard.guent...@gmail.com wrote: On November 22, 2014 12:24:22 PM CET, Eric Botcazou ebotca...@adacore.com wrote: Yeah, but after a couple of pings for a generic change, we went the target way. That's a bit of a shame, the 400 - 100 change was very likely tested only on x86-64 and nevetheless applied to the generic code, so the fix repairing the damages should also be applied to the generic code. A patch to bump the generic limit is OK. Targets that dont want it can reduce it in target specific code. I have committed the attached patch: 2014-11-22 Uros Bizjak ubiz...@gmail.com * params.def (PARAM_MAX_COMPLETELY_PEELED_INSNS): Increase to 200. * config/i386/i386.c (ix86_option_override_internal): Do not increase PARAM_MAX_COMPLETELY_PEELED_INSNS. Bootstrapped on x86_64-linux-gnu. Uros. Index: params.def === --- params.def (revision 217961) +++ params.def (working copy) @@ -303,7 +303,7 @@ DEFPARAM(PARAM_MAX_PEEL_BRANCHES, DEFPARAM(PARAM_MAX_COMPLETELY_PEELED_INSNS, max-completely-peeled-insns, The maximum number of insns of a completely peeled loop, - 100, 0, 0) + 200, 0, 0) /* The maximum number of peelings of a single loop that is peeled completely. */ DEFPARAM(PARAM_MAX_COMPLETELY_PEEL_TIMES, max-completely-peel-times, Index: config/i386/i386.c === --- config/i386/i386.c (revision 217961) +++ config/i386/i386.c (working copy) @@ -4142,12 +4142,6 @@ ix86_option_override_internal (bool main_args_p, opts-x_param_values, opts_set-x_param_values); - /* Increase full peel max insns parameter for x86. */ - maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, -200, -opts-x_param_values, -opts_set-x_param_values); - /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts-x_flag_prefetch_loop_arrays 0 HAVE_prefetch
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
PING. 200 currently looks optimal for x86. Let's commit the following: 2014-11-21 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6337aa5..5ac10eb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4081,6 +4081,12 @@ ix86_option_override_internal (bool main_args_p, opts-x_param_values, opts_set-x_param_values); + /* Extend full peel max insns parameter for x86. */ + maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, +200, +opts-x_param_values, +opts_set-x_param_values); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts-x_flag_prefetch_loop_arrays 0 HAVE_prefetch On Wed, Nov 12, 2014 at 5:02 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Code size for spec2000 is almost unchanged (many benchmarks have the same binaries). For those that are changed we have the following numbers (200 vs 100, both dynamic build -Ofast -funroll-loops -flto): 183.equake +10% 164.gzip, 173.applu +3,5% 187.facerec, 191.fma3d +2,5% 200.sixstrack +2% 177.mesa, 178.galgel +1% On Wed, Nov 12, 2014 at 2:51 AM, Jan Hubicka hubi...@ucw.cz wrote: 150 and 200 make Silvermont performance better on 173.applu (+8%) and 183.equake (+3%); Haswell spec2006 performance stays almost unchanged. Higher value of 300 leave the performance of mentioned tests unchanged, but add some regressions on other benchmarks. So I like 200 as well as 120 and 150, but can confirm performance gains only for x86. IMO it's either 150 or 200. We chose 200 for our 4.9-based compiler because this gave the performance boost without affecting the code size (on x86-64) and because this was previously 400, but it's your call. Both 150 or 200 globally work for me if there is not too much of code size bloat (did not see code size mentioned here). What I did before decreasing the bounds was strenghtening the loop iteraton count bounds and adding logic the predicts constant propagation enabled by unrolling. For this reason 400 became too large as we did a lot more complete unrolling than before. Also 400 in older compilers is not really 400 in newer. Because I saw performance to drop only with values bellow 50, I went for 100. It would be very interesting to actually analyze what happends for those two benchmarks (that should not be too hard with perf). Honza
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
On Fri, Nov 21, 2014 at 11:46 AM, Evgeny Stupachenko evstu...@gmail.com wrote: PING. 200 currently looks optimal for x86. Let's commit the following: 2014-11-21 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS. OK. Looks like a good performance vs. codesize tradeoff. Uros.
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
Code size for spec2000 is almost unchanged (many benchmarks have the same binaries). For those that are changed we have the following numbers (200 vs 100, both dynamic build -Ofast -funroll-loops -flto): 183.equake +10% 164.gzip, 173.applu +3,5% 187.facerec, 191.fma3d +2,5% 200.sixstrack +2% 177.mesa, 178.galgel +1% On Wed, Nov 12, 2014 at 2:51 AM, Jan Hubicka hubi...@ucw.cz wrote: 150 and 200 make Silvermont performance better on 173.applu (+8%) and 183.equake (+3%); Haswell spec2006 performance stays almost unchanged. Higher value of 300 leave the performance of mentioned tests unchanged, but add some regressions on other benchmarks. So I like 200 as well as 120 and 150, but can confirm performance gains only for x86. IMO it's either 150 or 200. We chose 200 for our 4.9-based compiler because this gave the performance boost without affecting the code size (on x86-64) and because this was previously 400, but it's your call. Both 150 or 200 globally work for me if there is not too much of code size bloat (did not see code size mentioned here). What I did before decreasing the bounds was strenghtening the loop iteraton count bounds and adding logic the predicts constant propagation enabled by unrolling. For this reason 400 became too large as we did a lot more complete unrolling than before. Also 400 in older compilers is not really 400 in newer. Because I saw performance to drop only with values bellow 50, I went for 100. It would be very interesting to actually analyze what happends for those two benchmarks (that should not be too hard with perf). Honza
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
150 and 200 make Silvermont performance better on 173.applu (+8%) and 183.equake (+3%); Haswell spec2006 performance stays almost unchanged. Higher value of 300 leave the performance of mentioned tests unchanged, but add some regressions on other benchmarks. So I like 200 as well as 120 and 150, but can confirm performance gains only for x86. On Fri, Nov 7, 2014 at 6:37 PM, Evgeny Stupachenko evstu...@gmail.com wrote: So are there any objections to enable this (PARAM_MAX_COMPLETELY_PEELED_INSNS increase from 100 to 120) for x86? On Fri, Oct 31, 2014 at 7:52 PM, Evgeny Stupachenko evstu...@gmail.com wrote: I've measured spec2000, spec2006 as well and EEMBC for Silvermont in addition. 100-120 change gives gain for Silvermont, the results on Haswell are flat. On Fri, Oct 31, 2014 at 3:14 PM, Eric Botcazou ebotca...@adacore.com wrote: Agreed, I think the value of 100 was set decade ago by Zdenek and me completely artifically. I do not recall any serious tuning of this flag. Are you talking bout PARAM_MAX_COMPLETELY_PEELED_INSNS here? If so, see: https://gcc.gnu.org/ml/gcc-patches/2012-11/msg01193.html We have experienced performance regressions because of this arbitrary change and bumped it back to 200 unconditionally. -- Eric Botcazou
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
150 and 200 make Silvermont performance better on 173.applu (+8%) and 183.equake (+3%); Haswell spec2006 performance stays almost unchanged. Higher value of 300 leave the performance of mentioned tests unchanged, but add some regressions on other benchmarks. So I like 200 as well as 120 and 150, but can confirm performance gains only for x86. IMO it's either 150 or 200. We chose 200 for our 4.9-based compiler because this gave the performance boost without affecting the code size (on x86-64) and because this was previously 400, but it's your call. -- Eric Botcazou
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
150 and 200 make Silvermont performance better on 173.applu (+8%) and 183.equake (+3%); Haswell spec2006 performance stays almost unchanged. Higher value of 300 leave the performance of mentioned tests unchanged, but add some regressions on other benchmarks. So I like 200 as well as 120 and 150, but can confirm performance gains only for x86. IMO it's either 150 or 200. We chose 200 for our 4.9-based compiler because this gave the performance boost without affecting the code size (on x86-64) and because this was previously 400, but it's your call. Both 150 or 200 globally work for me if there is not too much of code size bloat (did not see code size mentioned here). What I did before decreasing the bounds was strenghtening the loop iteraton count bounds and adding logic the predicts constant propagation enabled by unrolling. For this reason 400 became too large as we did a lot more complete unrolling than before. Also 400 in older compilers is not really 400 in newer. Because I saw performance to drop only with values bellow 50, I went for 100. It would be very interesting to actually analyze what happends for those two benchmarks (that should not be too hard with perf). Honza
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
So are there any objections to enable this (PARAM_MAX_COMPLETELY_PEELED_INSNS increase from 100 to 120) for x86? On Fri, Oct 31, 2014 at 7:52 PM, Evgeny Stupachenko evstu...@gmail.com wrote: I've measured spec2000, spec2006 as well and EEMBC for Silvermont in addition. 100-120 change gives gain for Silvermont, the results on Haswell are flat. On Fri, Oct 31, 2014 at 3:14 PM, Eric Botcazou ebotca...@adacore.com wrote: Agreed, I think the value of 100 was set decade ago by Zdenek and me completely artifically. I do not recall any serious tuning of this flag. Are you talking bout PARAM_MAX_COMPLETELY_PEELED_INSNS here? If so, see: https://gcc.gnu.org/ml/gcc-patches/2012-11/msg01193.html We have experienced performance regressions because of this arbitrary change and bumped it back to 200 unconditionally. -- Eric Botcazou
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
Agreed, I think the value of 100 was set decade ago by Zdenek and me completely artifically. I do not recall any serious tuning of this flag. Are you talking bout PARAM_MAX_COMPLETELY_PEELED_INSNS here? If so, see: https://gcc.gnu.org/ml/gcc-patches/2012-11/msg01193.html We have experienced performance regressions because of this arbitrary change and bumped it back to 200 unconditionally. -- Eric Botcazou
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
I've measured spec2000, spec2006 as well and EEMBC for Silvermont in addition. 100-120 change gives gain for Silvermont, the results on Haswell are flat. On Fri, Oct 31, 2014 at 3:14 PM, Eric Botcazou ebotca...@adacore.com wrote: Agreed, I think the value of 100 was set decade ago by Zdenek and me completely artifically. I do not recall any serious tuning of this flag. Are you talking bout PARAM_MAX_COMPLETELY_PEELED_INSNS here? If so, see: https://gcc.gnu.org/ml/gcc-patches/2012-11/msg01193.html We have experienced performance regressions because of this arbitrary change and bumped it back to 200 unconditionally. -- Eric Botcazou
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
On Tue, Oct 28, 2014 at 1:07 PM, Evgeny Stupachenko evstu...@gmail.com wrote: make check for gcc passed On Mon, Oct 27, 2014 at 11:10 AM, Evgeny Stupachenko evstu...@gmail.com wrote: The results are the same for Silvermont. There are no significant changes on Haswell. So I agree with Richard, let's enable this x86 wide. Bootstrap/ passed. Make check in progress. Is it ok? 2014-10-25 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS. Let's wait for Honza's approval ... Uros.
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
On Tue, Oct 28, 2014 at 1:07 PM, Evgeny Stupachenko evstu...@gmail.com wrote: make check for gcc passed On Mon, Oct 27, 2014 at 11:10 AM, Evgeny Stupachenko evstu...@gmail.com wrote: The results are the same for Silvermont. There are no significant changes on Haswell. So I agree with Richard, let's enable this x86 wide. Bootstrap/ passed. Make check in progress. Is it ok? 2014-10-25 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS. Let's wait for Honza's approval ... Looking through the emails, it is not clear to me if you re-tested that this still makes the intended speedup with the tree-level loop peeling? (comitted 2014-10-14). If it still works as intended, I do not think we have any reason to not change the default in params.def given that even ARM folks are calling for peeling by default. Honza Uros.
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
Yes the speed up is the same. However I'm testing only x86 performance. Potentially we can somehow hurt ARM or others performance. GCC already has the tuning enabled for rs6000,s390, spu. Evgeny On Thu, Oct 30, 2014 at 8:27 PM, Jan Hubicka hubi...@ucw.cz wrote: On Tue, Oct 28, 2014 at 1:07 PM, Evgeny Stupachenko evstu...@gmail.com wrote: make check for gcc passed On Mon, Oct 27, 2014 at 11:10 AM, Evgeny Stupachenko evstu...@gmail.com wrote: The results are the same for Silvermont. There are no significant changes on Haswell. So I agree with Richard, let's enable this x86 wide. Bootstrap/ passed. Make check in progress. Is it ok? 2014-10-25 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS. Let's wait for Honza's approval ... Looking through the emails, it is not clear to me if you re-tested that this still makes the intended speedup with the tree-level loop peeling? (comitted 2014-10-14). If it still works as intended, I do not think we have any reason to not change the default in params.def given that even ARM folks are calling for peeling by default. Honza Uros.
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
make check for gcc passed On Mon, Oct 27, 2014 at 11:10 AM, Evgeny Stupachenko evstu...@gmail.com wrote: The results are the same for Silvermont. There are no significant changes on Haswell. So I agree with Richard, let's enable this x86 wide. Bootstrap/ passed. Make check in progress. Is it ok? 2014-10-25 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6337aa5..5ac10eb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4081,6 +4081,12 @@ ix86_option_override_internal (bool main_args_p, opts-x_param_values, opts_set-x_param_values); + /* Extend full peel max insns parameter for x86. */ + maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, +120, +opts-x_param_values, +opts_set-x_param_values); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts-x_flag_prefetch_loop_arrays 0 HAVE_prefetch On Mon, Oct 13, 2014 at 4:23 PM, Jan Hubicka hubi...@ucw.cz wrote: On Fri, Oct 10, 2014 at 5:40 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Hi, The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. Bootstrap and make check are in progress. The patch boosts (up to 2,5 times improve) several benchmarks compiled with -Ofast on Silvermont Spec2000: +5% gain on 173.applu +1% gain on 255.vortex Is it ok for trunk when pass bootstrap and make check? This is only a 20% increase - from 100 to 120. I would instead suggest to explore doing this change unconditionally if it helps that much. Agreed, I think the value of 100 was set decade ago by Zdenek and me completely artifically. I do not recall any serious tuning of this flag. Note that I plan to update https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02270.html to current tree so PARAM_MAX_COMPLETELY_PEELED_INSNS will be used at gimple level rather than tree changing its meaning somewhat. Perhaps I could try to find time this or next week to update the patch so we do not need to do the tuning twice. Honza Richard. Thanks, Evgeny 2014-10-10 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. * config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New. * config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates CPUs with high branch cost. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6337aa5..5ac10eb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p, opts-x_param_values, opts_set-x_param_values); + /* Extend full peel max insns parameter for CPUs with high branch cost. */ + if (TARGET_HIGH_BRANCH_COST) +maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, + 120, + opts-x_param_values, + opts_set-x_param_values); + + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts-x_flag_prefetch_loop_arrays 0 HAVE_prefetch diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 2c64162..da0c57b 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -415,6 +415,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; #define TARGET_INTER_UNIT_CONVERSIONS \ ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS] #define TARGET_FOUR_JUMP_LIMIT ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT] +#define TARGET_HIGH_BRANCH_COST ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST] #define TARGET_SCHEDULE ix86_tune_features[X86_TUNE_SCHEDULE] #define TARGET_USE_BT ix86_tune_features[X86_TUNE_USE_BT] #define TARGET_USE_INCDEC ix86_tune_features[X86_TUNE_USE_INCDEC] diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index b6b210e..04d8bf8 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit, m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL | m_ATHLON_K8 | m_AMDFAM10) +/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost. This could be + used to tune unroll, if-cvt, inline... heuristics. */ +DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, high_branch_cost, + m_BONNELL | m_SILVERMONT | m_INTEL) +
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
The results are the same for Silvermont. There are no significant changes on Haswell. So I agree with Richard, let's enable this x86 wide. Bootstrap/ passed. Make check in progress. Is it ok? 2014-10-25 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6337aa5..5ac10eb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4081,6 +4081,12 @@ ix86_option_override_internal (bool main_args_p, opts-x_param_values, opts_set-x_param_values); + /* Extend full peel max insns parameter for x86. */ + maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, +120, +opts-x_param_values, +opts_set-x_param_values); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts-x_flag_prefetch_loop_arrays 0 HAVE_prefetch On Mon, Oct 13, 2014 at 4:23 PM, Jan Hubicka hubi...@ucw.cz wrote: On Fri, Oct 10, 2014 at 5:40 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Hi, The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. Bootstrap and make check are in progress. The patch boosts (up to 2,5 times improve) several benchmarks compiled with -Ofast on Silvermont Spec2000: +5% gain on 173.applu +1% gain on 255.vortex Is it ok for trunk when pass bootstrap and make check? This is only a 20% increase - from 100 to 120. I would instead suggest to explore doing this change unconditionally if it helps that much. Agreed, I think the value of 100 was set decade ago by Zdenek and me completely artifically. I do not recall any serious tuning of this flag. Note that I plan to update https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02270.html to current tree so PARAM_MAX_COMPLETELY_PEELED_INSNS will be used at gimple level rather than tree changing its meaning somewhat. Perhaps I could try to find time this or next week to update the patch so we do not need to do the tuning twice. Honza Richard. Thanks, Evgeny 2014-10-10 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. * config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New. * config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates CPUs with high branch cost. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6337aa5..5ac10eb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p, opts-x_param_values, opts_set-x_param_values); + /* Extend full peel max insns parameter for CPUs with high branch cost. */ + if (TARGET_HIGH_BRANCH_COST) +maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, + 120, + opts-x_param_values, + opts_set-x_param_values); + + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts-x_flag_prefetch_loop_arrays 0 HAVE_prefetch diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 2c64162..da0c57b 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -415,6 +415,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; #define TARGET_INTER_UNIT_CONVERSIONS \ ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS] #define TARGET_FOUR_JUMP_LIMIT ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT] +#define TARGET_HIGH_BRANCH_COST ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST] #define TARGET_SCHEDULE ix86_tune_features[X86_TUNE_SCHEDULE] #define TARGET_USE_BT ix86_tune_features[X86_TUNE_USE_BT] #define TARGET_USE_INCDEC ix86_tune_features[X86_TUNE_USE_INCDEC] diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index b6b210e..04d8bf8 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit, m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL | m_ATHLON_K8 | m_AMDFAM10) +/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost. This could be + used to tune unroll, if-cvt, inline... heuristics. */ +DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, high_branch_cost, + m_BONNELL | m_SILVERMONT | m_INTEL) + /*/ /* Integer instruction selection tuning */
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
On Fri, Oct 10, 2014 at 5:40 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Hi, The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. Bootstrap and make check are in progress. The patch boosts (up to 2,5 times improve) several benchmarks compiled with -Ofast on Silvermont Spec2000: +5% gain on 173.applu +1% gain on 255.vortex Is it ok for trunk when pass bootstrap and make check? This is only a 20% increase - from 100 to 120. I would instead suggest to explore doing this change unconditionally if it helps that much. Richard. Thanks, Evgeny 2014-10-10 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. * config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New. * config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates CPUs with high branch cost. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6337aa5..5ac10eb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p, opts-x_param_values, opts_set-x_param_values); + /* Extend full peel max insns parameter for CPUs with high branch cost. */ + if (TARGET_HIGH_BRANCH_COST) +maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, + 120, + opts-x_param_values, + opts_set-x_param_values); + + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts-x_flag_prefetch_loop_arrays 0 HAVE_prefetch diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 2c64162..da0c57b 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -415,6 +415,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; #define TARGET_INTER_UNIT_CONVERSIONS \ ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS] #define TARGET_FOUR_JUMP_LIMIT ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT] +#define TARGET_HIGH_BRANCH_COST ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST] #define TARGET_SCHEDULEix86_tune_features[X86_TUNE_SCHEDULE] #define TARGET_USE_BT ix86_tune_features[X86_TUNE_USE_BT] #define TARGET_USE_INCDEC ix86_tune_features[X86_TUNE_USE_INCDEC] diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index b6b210e..04d8bf8 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit, m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL | m_ATHLON_K8 | m_AMDFAM10) +/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost. This could be + used to tune unroll, if-cvt, inline... heuristics. */ +DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, high_branch_cost, + m_BONNELL | m_SILVERMONT | m_INTEL) + /*/ /* Integer instruction selection tuning */ /*/
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
I need to collect data from Haswell, but the patch should not help it's performance much, just increase code size. On Mon, Oct 13, 2014 at 12:01 PM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Oct 10, 2014 at 5:40 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Hi, The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. Bootstrap and make check are in progress. The patch boosts (up to 2,5 times improve) several benchmarks compiled with -Ofast on Silvermont Spec2000: +5% gain on 173.applu +1% gain on 255.vortex Is it ok for trunk when pass bootstrap and make check? This is only a 20% increase - from 100 to 120. I would instead suggest to explore doing this change unconditionally if it helps that much. Richard. Thanks, Evgeny 2014-10-10 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. * config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New. * config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates CPUs with high branch cost. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6337aa5..5ac10eb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p, opts-x_param_values, opts_set-x_param_values); + /* Extend full peel max insns parameter for CPUs with high branch cost. */ + if (TARGET_HIGH_BRANCH_COST) +maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, + 120, + opts-x_param_values, + opts_set-x_param_values); + + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts-x_flag_prefetch_loop_arrays 0 HAVE_prefetch diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 2c64162..da0c57b 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -415,6 +415,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; #define TARGET_INTER_UNIT_CONVERSIONS \ ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS] #define TARGET_FOUR_JUMP_LIMIT ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT] +#define TARGET_HIGH_BRANCH_COST ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST] #define TARGET_SCHEDULEix86_tune_features[X86_TUNE_SCHEDULE] #define TARGET_USE_BT ix86_tune_features[X86_TUNE_USE_BT] #define TARGET_USE_INCDEC ix86_tune_features[X86_TUNE_USE_INCDEC] diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index b6b210e..04d8bf8 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit, m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL | m_ATHLON_K8 | m_AMDFAM10) +/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost. This could be + used to tune unroll, if-cvt, inline... heuristics. */ +DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, high_branch_cost, + m_BONNELL | m_SILVERMONT | m_INTEL) + /*/ /* Integer instruction selection tuning */ /*/
Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly
On Fri, Oct 10, 2014 at 5:40 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Hi, The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. Bootstrap and make check are in progress. The patch boosts (up to 2,5 times improve) several benchmarks compiled with -Ofast on Silvermont Spec2000: +5% gain on 173.applu +1% gain on 255.vortex Is it ok for trunk when pass bootstrap and make check? This is only a 20% increase - from 100 to 120. I would instead suggest to explore doing this change unconditionally if it helps that much. Agreed, I think the value of 100 was set decade ago by Zdenek and me completely artifically. I do not recall any serious tuning of this flag. Note that I plan to update https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02270.html to current tree so PARAM_MAX_COMPLETELY_PEELED_INSNS will be used at gimple level rather than tree changing its meaning somewhat. Perhaps I could try to find time this or next week to update the patch so we do not need to do the tuning twice. Honza Richard. Thanks, Evgeny 2014-10-10 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. * config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New. * config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates CPUs with high branch cost. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6337aa5..5ac10eb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p, opts-x_param_values, opts_set-x_param_values); + /* Extend full peel max insns parameter for CPUs with high branch cost. */ + if (TARGET_HIGH_BRANCH_COST) +maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, + 120, + opts-x_param_values, + opts_set-x_param_values); + + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts-x_flag_prefetch_loop_arrays 0 HAVE_prefetch diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 2c64162..da0c57b 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -415,6 +415,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; #define TARGET_INTER_UNIT_CONVERSIONS \ ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS] #define TARGET_FOUR_JUMP_LIMIT ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT] +#define TARGET_HIGH_BRANCH_COST ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST] #define TARGET_SCHEDULE ix86_tune_features[X86_TUNE_SCHEDULE] #define TARGET_USE_BT ix86_tune_features[X86_TUNE_USE_BT] #define TARGET_USE_INCDEC ix86_tune_features[X86_TUNE_USE_INCDEC] diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index b6b210e..04d8bf8 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit, m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL | m_ATHLON_K8 | m_AMDFAM10) +/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost. This could be + used to tune unroll, if-cvt, inline... heuristics. */ +DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, high_branch_cost, + m_BONNELL | m_SILVERMONT | m_INTEL) + /*/ /* Integer instruction selection tuning */ /*/