Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-11-22 Thread Eric Botcazou
 OK. Looks like a good performance vs. codesize tradeoff.

Yes, but IMO this should be done in the generic code, unrolling small loops is 
profitable on most architectures.

-- 
Eric Botcazou


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-11-22 Thread Uros Bizjak
On Sat, Nov 22, 2014 at 10:49 AM, Eric Botcazou ebotca...@adacore.com wrote:
 OK. Looks like a good performance vs. codesize tradeoff.

 Yes, but IMO this should be done in the generic code, unrolling small loops is
 profitable on most architectures.

Yeah, but after a couple of pings for a generic change, we went the target way.

Uros.


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-11-22 Thread Eric Botcazou
 Yeah, but after a couple of pings for a generic change, we went the target
 way.

That's a bit of a shame, the 400 - 100 change was very likely tested only on 
x86-64 and nevetheless applied to the generic code, so the fix repairing the 
damages should also be applied to the generic code.

-- 
Eric Botcazou


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-11-22 Thread Richard Biener
On November 22, 2014 12:24:22 PM CET, Eric Botcazou ebotca...@adacore.com 
wrote:
 Yeah, but after a couple of pings for a generic change, we went the
target
 way.

That's a bit of a shame, the 400 - 100 change was very likely tested
only on 
x86-64 and nevetheless applied to the generic code, so the fix
repairing the 
damages should also be applied to the generic code.

A patch to bump the generic limit is OK.

Targets that dont want it can reduce it in target specific code.

Richard.




Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-11-22 Thread Uros Bizjak
On Sat, Nov 22, 2014 at 7:38 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On November 22, 2014 12:24:22 PM CET, Eric Botcazou ebotca...@adacore.com 
 wrote:
 Yeah, but after a couple of pings for a generic change, we went the
target
 way.

That's a bit of a shame, the 400 - 100 change was very likely tested
only on
x86-64 and nevetheless applied to the generic code, so the fix
repairing the
damages should also be applied to the generic code.

 A patch to bump the generic limit is OK.

 Targets that dont want it can reduce it in target specific code.

I have committed the attached patch:

2014-11-22  Uros Bizjak  ubiz...@gmail.com

* params.def (PARAM_MAX_COMPLETELY_PEELED_INSNS): Increase to 200.
* config/i386/i386.c (ix86_option_override_internal): Do not increase
PARAM_MAX_COMPLETELY_PEELED_INSNS.

Bootstrapped on x86_64-linux-gnu.

Uros.
Index: params.def
===
--- params.def  (revision 217961)
+++ params.def  (working copy)
@@ -303,7 +303,7 @@ DEFPARAM(PARAM_MAX_PEEL_BRANCHES,
 DEFPARAM(PARAM_MAX_COMPLETELY_PEELED_INSNS,
max-completely-peeled-insns,
The maximum number of insns of a completely peeled loop,
-   100, 0, 0)
+   200, 0, 0)
 /* The maximum number of peelings of a single loop that is peeled completely.  
*/
 DEFPARAM(PARAM_MAX_COMPLETELY_PEEL_TIMES,
max-completely-peel-times,
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 217961)
+++ config/i386/i386.c  (working copy)
@@ -4142,12 +4142,6 @@ ix86_option_override_internal (bool main_args_p,
 opts-x_param_values,
 opts_set-x_param_values);
 
-  /* Increase full peel max insns parameter for x86.  */
-  maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
-200,
-opts-x_param_values,
-opts_set-x_param_values);
-
   /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
   if (opts-x_flag_prefetch_loop_arrays  0
HAVE_prefetch


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-11-21 Thread Evgeny Stupachenko
PING.
200 currently looks optimal for x86.
Let's commit the following:

2014-11-21  Evgeny Stupachenko  evstu...@gmail.com
* config/i386/i386.c (ix86_option_override_internal): Increase
PARAM_MAX_COMPLETELY_PEELED_INSNS.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6337aa5..5ac10eb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4081,6 +4081,12 @@ ix86_option_override_internal (bool main_args_p,
 opts-x_param_values,
 opts_set-x_param_values);

+  /* Extend full peel max insns parameter for x86.  */

+  maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
+200,
+opts-x_param_values,
+opts_set-x_param_values);
+
   /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
   if (opts-x_flag_prefetch_loop_arrays  0
HAVE_prefetch

On Wed, Nov 12, 2014 at 5:02 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Code size for spec2000 is almost unchanged (many benchmarks have the
 same binaries).
 For those that are changed we have the following numbers (200 vs 100,
 both dynamic build -Ofast -funroll-loops -flto):
 183.equake +10%
 164.gzip, 173.applu +3,5%
 187.facerec, 191.fma3d +2,5%
 200.sixstrack +2%
 177.mesa, 178.galgel +1%


 On Wed, Nov 12, 2014 at 2:51 AM, Jan Hubicka hubi...@ucw.cz wrote:
  150 and 200 make Silvermont performance better on 173.applu (+8%) and
  183.equake (+3%); Haswell spec2006 performance stays almost unchanged.
  Higher value of 300 leave the performance of mentioned tests
  unchanged, but add some regressions on other benchmarks.
 
  So I like 200 as well as 120 and 150, but can confirm performance
  gains only for x86.

 IMO it's either 150 or 200.  We chose 200 for our 4.9-based compiler because
 this gave the performance boost without affecting the code size (on x86-64)
 and because this was previously 400, but it's your call.

 Both 150 or 200 globally work for me if there is not too much of code size
 bloat (did not see code size mentioned here).

 What I did before decreasing the bounds was strenghtening the loop iteraton
 count bounds and adding logic the predicts constant propagation enabled by
 unrolling. For this reason 400 became too large as we did a lot more complete
 unrolling than before. Also 400 in older compilers is not really 400 in 
 newer.

 Because I saw performance to drop only with values bellow 50, I went for 100.
 It would be very interesting to actually analyze what happends for those two
 benchmarks (that should not be too hard with perf).

 Honza


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-11-21 Thread Uros Bizjak
On Fri, Nov 21, 2014 at 11:46 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 PING.
 200 currently looks optimal for x86.
 Let's commit the following:

 2014-11-21  Evgeny Stupachenko  evstu...@gmail.com
 * config/i386/i386.c (ix86_option_override_internal): Increase
 PARAM_MAX_COMPLETELY_PEELED_INSNS.

OK. Looks like a good performance vs. codesize tradeoff.

Uros.


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-11-12 Thread Evgeny Stupachenko
Code size for spec2000 is almost unchanged (many benchmarks have the
same binaries).
For those that are changed we have the following numbers (200 vs 100,
both dynamic build -Ofast -funroll-loops -flto):
183.equake +10%
164.gzip, 173.applu +3,5%
187.facerec, 191.fma3d +2,5%
200.sixstrack +2%
177.mesa, 178.galgel +1%


On Wed, Nov 12, 2014 at 2:51 AM, Jan Hubicka hubi...@ucw.cz wrote:
  150 and 200 make Silvermont performance better on 173.applu (+8%) and
  183.equake (+3%); Haswell spec2006 performance stays almost unchanged.
  Higher value of 300 leave the performance of mentioned tests
  unchanged, but add some regressions on other benchmarks.
 
  So I like 200 as well as 120 and 150, but can confirm performance
  gains only for x86.

 IMO it's either 150 or 200.  We chose 200 for our 4.9-based compiler because
 this gave the performance boost without affecting the code size (on x86-64)
 and because this was previously 400, but it's your call.

 Both 150 or 200 globally work for me if there is not too much of code size
 bloat (did not see code size mentioned here).

 What I did before decreasing the bounds was strenghtening the loop iteraton
 count bounds and adding logic the predicts constant propagation enabled by
 unrolling. For this reason 400 became too large as we did a lot more complete
 unrolling than before. Also 400 in older compilers is not really 400 in newer.

 Because I saw performance to drop only with values bellow 50, I went for 100.
 It would be very interesting to actually analyze what happends for those two
 benchmarks (that should not be too hard with perf).

 Honza


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-11-11 Thread Evgeny Stupachenko
150 and 200 make Silvermont performance better on 173.applu (+8%) and
183.equake (+3%); Haswell spec2006 performance stays almost unchanged.
Higher value of 300 leave the performance of mentioned tests
unchanged, but add some regressions on other benchmarks.

So I like 200 as well as 120 and 150, but can confirm performance
gains only for x86.


On Fri, Nov 7, 2014 at 6:37 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 So are there any objections to enable this
 (PARAM_MAX_COMPLETELY_PEELED_INSNS increase from 100 to 120) for x86?

 On Fri, Oct 31, 2014 at 7:52 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 I've measured spec2000, spec2006 as well and EEMBC for Silvermont in 
 addition.
 100-120 change gives gain for Silvermont, the results on Haswell are flat.

 On Fri, Oct 31, 2014 at 3:14 PM, Eric Botcazou ebotca...@adacore.com wrote:
 Agreed, I think the value of 100 was set decade ago by Zdenek and me
 completely artifically. I do not recall any serious tuning of this flag.

 Are you talking bout PARAM_MAX_COMPLETELY_PEELED_INSNS here?  If so, see:
   https://gcc.gnu.org/ml/gcc-patches/2012-11/msg01193.html

 We have experienced performance regressions because of this arbitrary change
 and bumped it back to 200 unconditionally.

 --
 Eric Botcazou


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-11-11 Thread Eric Botcazou
 150 and 200 make Silvermont performance better on 173.applu (+8%) and
 183.equake (+3%); Haswell spec2006 performance stays almost unchanged.
 Higher value of 300 leave the performance of mentioned tests
 unchanged, but add some regressions on other benchmarks.
 
 So I like 200 as well as 120 and 150, but can confirm performance
 gains only for x86.

IMO it's either 150 or 200.  We chose 200 for our 4.9-based compiler because 
this gave the performance boost without affecting the code size (on x86-64) 
and because this was previously 400, but it's your call.

-- 
Eric Botcazou


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-11-11 Thread Jan Hubicka
  150 and 200 make Silvermont performance better on 173.applu (+8%) and
  183.equake (+3%); Haswell spec2006 performance stays almost unchanged.
  Higher value of 300 leave the performance of mentioned tests
  unchanged, but add some regressions on other benchmarks.
  
  So I like 200 as well as 120 and 150, but can confirm performance
  gains only for x86.
 
 IMO it's either 150 or 200.  We chose 200 for our 4.9-based compiler because 
 this gave the performance boost without affecting the code size (on x86-64) 
 and because this was previously 400, but it's your call.

Both 150 or 200 globally work for me if there is not too much of code size
bloat (did not see code size mentioned here). 

What I did before decreasing the bounds was strenghtening the loop iteraton
count bounds and adding logic the predicts constant propagation enabled by
unrolling. For this reason 400 became too large as we did a lot more complete
unrolling than before. Also 400 in older compilers is not really 400 in newer.

Because I saw performance to drop only with values bellow 50, I went for 100.
It would be very interesting to actually analyze what happends for those two
benchmarks (that should not be too hard with perf).

Honza


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-11-07 Thread Evgeny Stupachenko
So are there any objections to enable this
(PARAM_MAX_COMPLETELY_PEELED_INSNS increase from 100 to 120) for x86?

On Fri, Oct 31, 2014 at 7:52 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 I've measured spec2000, spec2006 as well and EEMBC for Silvermont in addition.
 100-120 change gives gain for Silvermont, the results on Haswell are flat.

 On Fri, Oct 31, 2014 at 3:14 PM, Eric Botcazou ebotca...@adacore.com wrote:
 Agreed, I think the value of 100 was set decade ago by Zdenek and me
 completely artifically. I do not recall any serious tuning of this flag.

 Are you talking bout PARAM_MAX_COMPLETELY_PEELED_INSNS here?  If so, see:
   https://gcc.gnu.org/ml/gcc-patches/2012-11/msg01193.html

 We have experienced performance regressions because of this arbitrary change
 and bumped it back to 200 unconditionally.

 --
 Eric Botcazou


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-10-31 Thread Eric Botcazou
 Agreed, I think the value of 100 was set decade ago by Zdenek and me
 completely artifically. I do not recall any serious tuning of this flag.

Are you talking bout PARAM_MAX_COMPLETELY_PEELED_INSNS here?  If so, see:
  https://gcc.gnu.org/ml/gcc-patches/2012-11/msg01193.html

We have experienced performance regressions because of this arbitrary change 
and bumped it back to 200 unconditionally.

-- 
Eric Botcazou


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-10-31 Thread Evgeny Stupachenko
I've measured spec2000, spec2006 as well and EEMBC for Silvermont in addition.
100-120 change gives gain for Silvermont, the results on Haswell are flat.

On Fri, Oct 31, 2014 at 3:14 PM, Eric Botcazou ebotca...@adacore.com wrote:
 Agreed, I think the value of 100 was set decade ago by Zdenek and me
 completely artifically. I do not recall any serious tuning of this flag.

 Are you talking bout PARAM_MAX_COMPLETELY_PEELED_INSNS here?  If so, see:
   https://gcc.gnu.org/ml/gcc-patches/2012-11/msg01193.html

 We have experienced performance regressions because of this arbitrary change
 and bumped it back to 200 unconditionally.

 --
 Eric Botcazou


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-10-30 Thread Uros Bizjak
On Tue, Oct 28, 2014 at 1:07 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 make check for gcc passed

 On Mon, Oct 27, 2014 at 11:10 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 The results are the same for Silvermont.
 There are no significant changes on Haswell.
 So I agree with Richard, let's enable this x86 wide.

 Bootstrap/ passed.
 Make check in progress.
 Is it ok?

 2014-10-25  Evgeny Stupachenko  evstu...@gmail.com
 * config/i386/i386.c (ix86_option_override_internal): Increase
 PARAM_MAX_COMPLETELY_PEELED_INSNS.

Let's wait for Honza's approval ...

Uros.


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-10-30 Thread Jan Hubicka
 On Tue, Oct 28, 2014 at 1:07 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
  make check for gcc passed
 
  On Mon, Oct 27, 2014 at 11:10 AM, Evgeny Stupachenko evstu...@gmail.com 
  wrote:
  The results are the same for Silvermont.
  There are no significant changes on Haswell.
  So I agree with Richard, let's enable this x86 wide.
 
  Bootstrap/ passed.
  Make check in progress.
  Is it ok?
 
  2014-10-25  Evgeny Stupachenko  evstu...@gmail.com
  * config/i386/i386.c (ix86_option_override_internal): Increase
  PARAM_MAX_COMPLETELY_PEELED_INSNS.
 
 Let's wait for Honza's approval ...

Looking through the emails, it is not clear to me if you re-tested that this 
still
makes the intended speedup with the tree-level loop peeling? (comitted 
2014-10-14).
If it still works as intended, I do not think we have any reason to not change 
the
default in params.def given that even ARM folks are calling for peeling by 
default.

Honza
 
 Uros.


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-10-30 Thread Evgeny Stupachenko
Yes the speed up is the same. However I'm testing only x86
performance. Potentially we can somehow hurt ARM or others
performance.
GCC already has the tuning enabled for rs6000,s390, spu.

Evgeny

On Thu, Oct 30, 2014 at 8:27 PM, Jan Hubicka hubi...@ucw.cz wrote:
 On Tue, Oct 28, 2014 at 1:07 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
  make check for gcc passed
 
  On Mon, Oct 27, 2014 at 11:10 AM, Evgeny Stupachenko evstu...@gmail.com 
  wrote:
  The results are the same for Silvermont.
  There are no significant changes on Haswell.
  So I agree with Richard, let's enable this x86 wide.
 
  Bootstrap/ passed.
  Make check in progress.
  Is it ok?
 
  2014-10-25  Evgeny Stupachenko  evstu...@gmail.com
  * config/i386/i386.c (ix86_option_override_internal): Increase
  PARAM_MAX_COMPLETELY_PEELED_INSNS.

 Let's wait for Honza's approval ...

 Looking through the emails, it is not clear to me if you re-tested that this 
 still
 makes the intended speedup with the tree-level loop peeling? (comitted 
 2014-10-14).
 If it still works as intended, I do not think we have any reason to not 
 change the
 default in params.def given that even ARM folks are calling for peeling by 
 default.

 Honza

 Uros.


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-10-28 Thread Evgeny Stupachenko
make check for gcc passed

On Mon, Oct 27, 2014 at 11:10 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 The results are the same for Silvermont.
 There are no significant changes on Haswell.
 So I agree with Richard, let's enable this x86 wide.

 Bootstrap/ passed.
 Make check in progress.
 Is it ok?

 2014-10-25  Evgeny Stupachenko  evstu...@gmail.com
 * config/i386/i386.c (ix86_option_override_internal): Increase
 PARAM_MAX_COMPLETELY_PEELED_INSNS.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 6337aa5..5ac10eb 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -4081,6 +4081,12 @@ ix86_option_override_internal (bool main_args_p,
  opts-x_param_values,
  opts_set-x_param_values);

 +  /* Extend full peel max insns parameter for x86.  */
 +  maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
 +120,
 +opts-x_param_values,
 +opts_set-x_param_values);
 +
/* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
if (opts-x_flag_prefetch_loop_arrays  0
 HAVE_prefetch

 On Mon, Oct 13, 2014 at 4:23 PM, Jan Hubicka hubi...@ucw.cz wrote:
 On Fri, Oct 10, 2014 at 5:40 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
  Hi,
 
  The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with
  high branch cost.
  Bootstrap and make check are in progress.
  The patch boosts (up to 2,5 times improve) several benchmarks compiled
  with -Ofast on Silvermont
  Spec2000:
  +5% gain on 173.applu
  +1% gain on 255.vortex
 
  Is it ok for trunk when pass bootstrap and make check?

 This is only a 20% increase - from 100 to 120.  I would instead suggest
 to explore doing this change unconditionally if it helps that much.

 Agreed, I think the value of 100 was set decade ago by Zdenek and me 
 completely
 artifically. I do not recall any serious tuning of this flag.

 Note that I plan to update
 https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02270.html to current tree so
 PARAM_MAX_COMPLETELY_PEELED_INSNS will be used at gimple level rather than 
 tree
 changing its meaning somewhat.

 Perhaps I could try to find time this or next week to update the patch so we 
 do
 not need to do the tuning twice.

 Honza


 Richard.

  Thanks,
  Evgeny
 
  2014-10-10  Evgeny Stupachenko  evstu...@gmail.com
  * config/i386/i386.c (ix86_option_override_internal): Increase
  PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost.
  * config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New.
  * config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates
  CPUs with high branch cost.
 
  diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
  index 6337aa5..5ac10eb 100644
  --- a/gcc/config/i386/i386.c
  +++ b/gcc/config/i386/i386.c
  @@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p,
   opts-x_param_values,
   opts_set-x_param_values);
 
  +  /* Extend full peel max insns parameter for CPUs with high branch 
  cost.  */
  +  if (TARGET_HIGH_BRANCH_COST)
  +maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
  +  120,
  +  opts-x_param_values,
  +  opts_set-x_param_values);
  +
  +
 /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  
  */
 if (opts-x_flag_prefetch_loop_arrays  0
  HAVE_prefetch
  diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
  index 2c64162..da0c57b 100644
  --- a/gcc/config/i386/i386.h
  +++ b/gcc/config/i386/i386.h
  @@ -415,6 +415,7 @@ extern unsigned char 
  ix86_tune_features[X86_TUNE_LAST];
   #define TARGET_INTER_UNIT_CONVERSIONS \
  ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS]
   #define TARGET_FOUR_JUMP_LIMIT 
  ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT]
  +#define TARGET_HIGH_BRANCH_COST
  ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST]
   #define TARGET_SCHEDULE
  ix86_tune_features[X86_TUNE_SCHEDULE]
   #define TARGET_USE_BT  ix86_tune_features[X86_TUNE_USE_BT]
   #define TARGET_USE_INCDEC  ix86_tune_features[X86_TUNE_USE_INCDEC]
  diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
  index b6b210e..04d8bf8 100644
  --- a/gcc/config/i386/x86-tune.def
  +++ b/gcc/config/i386/x86-tune.def
  @@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, 
  four_jump_limit,
 m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL |
m_ATHLON_K8 | m_AMDFAM10)
 
  +/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost.  This 
  could be
  +   used to tune unroll, if-cvt, inline... heuristics.  */
  +DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, high_branch_cost,
  +  m_BONNELL | m_SILVERMONT | m_INTEL)
  +
   
  

Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-10-27 Thread Evgeny Stupachenko
The results are the same for Silvermont.
There are no significant changes on Haswell.
So I agree with Richard, let's enable this x86 wide.

Bootstrap/ passed.
Make check in progress.
Is it ok?

2014-10-25  Evgeny Stupachenko  evstu...@gmail.com
* config/i386/i386.c (ix86_option_override_internal): Increase
PARAM_MAX_COMPLETELY_PEELED_INSNS.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6337aa5..5ac10eb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4081,6 +4081,12 @@ ix86_option_override_internal (bool main_args_p,
 opts-x_param_values,
 opts_set-x_param_values);

+  /* Extend full peel max insns parameter for x86.  */
+  maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
+120,
+opts-x_param_values,
+opts_set-x_param_values);
+
   /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
   if (opts-x_flag_prefetch_loop_arrays  0
HAVE_prefetch

On Mon, Oct 13, 2014 at 4:23 PM, Jan Hubicka hubi...@ucw.cz wrote:
 On Fri, Oct 10, 2014 at 5:40 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
  Hi,
 
  The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with
  high branch cost.
  Bootstrap and make check are in progress.
  The patch boosts (up to 2,5 times improve) several benchmarks compiled
  with -Ofast on Silvermont
  Spec2000:
  +5% gain on 173.applu
  +1% gain on 255.vortex
 
  Is it ok for trunk when pass bootstrap and make check?

 This is only a 20% increase - from 100 to 120.  I would instead suggest
 to explore doing this change unconditionally if it helps that much.

 Agreed, I think the value of 100 was set decade ago by Zdenek and me 
 completely
 artifically. I do not recall any serious tuning of this flag.

 Note that I plan to update
 https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02270.html to current tree so
 PARAM_MAX_COMPLETELY_PEELED_INSNS will be used at gimple level rather than 
 tree
 changing its meaning somewhat.

 Perhaps I could try to find time this or next week to update the patch so we 
 do
 not need to do the tuning twice.

 Honza


 Richard.

  Thanks,
  Evgeny
 
  2014-10-10  Evgeny Stupachenko  evstu...@gmail.com
  * config/i386/i386.c (ix86_option_override_internal): Increase
  PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost.
  * config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New.
  * config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates
  CPUs with high branch cost.
 
  diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
  index 6337aa5..5ac10eb 100644
  --- a/gcc/config/i386/i386.c
  +++ b/gcc/config/i386/i386.c
  @@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p,
   opts-x_param_values,
   opts_set-x_param_values);
 
  +  /* Extend full peel max insns parameter for CPUs with high branch cost. 
   */
  +  if (TARGET_HIGH_BRANCH_COST)
  +maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
  +  120,
  +  opts-x_param_values,
  +  opts_set-x_param_values);
  +
  +
 /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  
  */
 if (opts-x_flag_prefetch_loop_arrays  0
  HAVE_prefetch
  diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
  index 2c64162..da0c57b 100644
  --- a/gcc/config/i386/i386.h
  +++ b/gcc/config/i386/i386.h
  @@ -415,6 +415,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
   #define TARGET_INTER_UNIT_CONVERSIONS \
  ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS]
   #define TARGET_FOUR_JUMP_LIMIT 
  ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT]
  +#define TARGET_HIGH_BRANCH_COST
  ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST]
   #define TARGET_SCHEDULE
  ix86_tune_features[X86_TUNE_SCHEDULE]
   #define TARGET_USE_BT  ix86_tune_features[X86_TUNE_USE_BT]
   #define TARGET_USE_INCDEC  ix86_tune_features[X86_TUNE_USE_INCDEC]
  diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
  index b6b210e..04d8bf8 100644
  --- a/gcc/config/i386/x86-tune.def
  +++ b/gcc/config/i386/x86-tune.def
  @@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit,
 m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL |
m_ATHLON_K8 | m_AMDFAM10)
 
  +/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost.  This 
  could be
  +   used to tune unroll, if-cvt, inline... heuristics.  */
  +DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, high_branch_cost,
  +  m_BONNELL | m_SILVERMONT | m_INTEL)
  +
   
  /*/
   /* Integer instruction selection tuning   
 */
   
  

Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-10-13 Thread Richard Biener
On Fri, Oct 10, 2014 at 5:40 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Hi,

 The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with
 high branch cost.
 Bootstrap and make check are in progress.
 The patch boosts (up to 2,5 times improve) several benchmarks compiled
 with -Ofast on Silvermont
 Spec2000:
 +5% gain on 173.applu
 +1% gain on 255.vortex

 Is it ok for trunk when pass bootstrap and make check?

This is only a 20% increase - from 100 to 120.  I would instead suggest
to explore doing this change unconditionally if it helps that much.

Richard.

 Thanks,
 Evgeny

 2014-10-10  Evgeny Stupachenko  evstu...@gmail.com
 * config/i386/i386.c (ix86_option_override_internal): Increase
 PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost.
 * config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New.
 * config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates
 CPUs with high branch cost.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 6337aa5..5ac10eb 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p,
  opts-x_param_values,
  opts_set-x_param_values);

 +  /* Extend full peel max insns parameter for CPUs with high branch cost.  */
 +  if (TARGET_HIGH_BRANCH_COST)
 +maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
 +  120,
 +  opts-x_param_values,
 +  opts_set-x_param_values);
 +
 +
/* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
if (opts-x_flag_prefetch_loop_arrays  0
 HAVE_prefetch
 diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
 index 2c64162..da0c57b 100644
 --- a/gcc/config/i386/i386.h
 +++ b/gcc/config/i386/i386.h
 @@ -415,6 +415,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
  #define TARGET_INTER_UNIT_CONVERSIONS \
 ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS]
  #define TARGET_FOUR_JUMP_LIMIT ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT]
 +#define TARGET_HIGH_BRANCH_COST
 ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST]
  #define TARGET_SCHEDULEix86_tune_features[X86_TUNE_SCHEDULE]
  #define TARGET_USE_BT  ix86_tune_features[X86_TUNE_USE_BT]
  #define TARGET_USE_INCDEC  ix86_tune_features[X86_TUNE_USE_INCDEC]
 diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
 index b6b210e..04d8bf8 100644
 --- a/gcc/config/i386/x86-tune.def
 +++ b/gcc/config/i386/x86-tune.def
 @@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit,
m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL |
   m_ATHLON_K8 | m_AMDFAM10)

 +/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost.  This could 
 be
 +   used to tune unroll, if-cvt, inline... heuristics.  */
 +DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, high_branch_cost,
 +  m_BONNELL | m_SILVERMONT | m_INTEL)
 +
  
 /*/
  /* Integer instruction selection tuning  
 */
  
 /*/


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-10-13 Thread Evgeny Stupachenko
I need to collect data from Haswell, but the patch should not help
it's performance much, just increase code size.

On Mon, Oct 13, 2014 at 12:01 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Oct 10, 2014 at 5:40 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 Hi,

 The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with
 high branch cost.
 Bootstrap and make check are in progress.
 The patch boosts (up to 2,5 times improve) several benchmarks compiled
 with -Ofast on Silvermont
 Spec2000:
 +5% gain on 173.applu
 +1% gain on 255.vortex

 Is it ok for trunk when pass bootstrap and make check?

 This is only a 20% increase - from 100 to 120.  I would instead suggest
 to explore doing this change unconditionally if it helps that much.

 Richard.

 Thanks,
 Evgeny

 2014-10-10  Evgeny Stupachenko  evstu...@gmail.com
 * config/i386/i386.c (ix86_option_override_internal): Increase
 PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost.
 * config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New.
 * config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates
 CPUs with high branch cost.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 6337aa5..5ac10eb 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p,
  opts-x_param_values,
  opts_set-x_param_values);

 +  /* Extend full peel max insns parameter for CPUs with high branch cost.  
 */
 +  if (TARGET_HIGH_BRANCH_COST)
 +maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
 +  120,
 +  opts-x_param_values,
 +  opts_set-x_param_values);
 +
 +
/* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
if (opts-x_flag_prefetch_loop_arrays  0
 HAVE_prefetch
 diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
 index 2c64162..da0c57b 100644
 --- a/gcc/config/i386/i386.h
 +++ b/gcc/config/i386/i386.h
 @@ -415,6 +415,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
  #define TARGET_INTER_UNIT_CONVERSIONS \
 ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS]
  #define TARGET_FOUR_JUMP_LIMIT ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT]
 +#define TARGET_HIGH_BRANCH_COST
 ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST]
  #define TARGET_SCHEDULEix86_tune_features[X86_TUNE_SCHEDULE]
  #define TARGET_USE_BT  ix86_tune_features[X86_TUNE_USE_BT]
  #define TARGET_USE_INCDEC  ix86_tune_features[X86_TUNE_USE_INCDEC]
 diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
 index b6b210e..04d8bf8 100644
 --- a/gcc/config/i386/x86-tune.def
 +++ b/gcc/config/i386/x86-tune.def
 @@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit,
m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL |
   m_ATHLON_K8 | m_AMDFAM10)

 +/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost.  This 
 could be
 +   used to tune unroll, if-cvt, inline... heuristics.  */
 +DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, high_branch_cost,
 +  m_BONNELL | m_SILVERMONT | m_INTEL)
 +
  
 /*/
  /* Integer instruction selection tuning 
  */
  
 /*/


Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

2014-10-13 Thread Jan Hubicka
 On Fri, Oct 10, 2014 at 5:40 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
  Hi,
 
  The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with
  high branch cost.
  Bootstrap and make check are in progress.
  The patch boosts (up to 2,5 times improve) several benchmarks compiled
  with -Ofast on Silvermont
  Spec2000:
  +5% gain on 173.applu
  +1% gain on 255.vortex
 
  Is it ok for trunk when pass bootstrap and make check?
 
 This is only a 20% increase - from 100 to 120.  I would instead suggest
 to explore doing this change unconditionally if it helps that much.

Agreed, I think the value of 100 was set decade ago by Zdenek and me completely
artifically. I do not recall any serious tuning of this flag.

Note that I plan to update
https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02270.html to current tree so
PARAM_MAX_COMPLETELY_PEELED_INSNS will be used at gimple level rather than tree
changing its meaning somewhat.

Perhaps I could try to find time this or next week to update the patch so we do
not need to do the tuning twice.

Honza

 
 Richard.
 
  Thanks,
  Evgeny
 
  2014-10-10  Evgeny Stupachenko  evstu...@gmail.com
  * config/i386/i386.c (ix86_option_override_internal): Increase
  PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost.
  * config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New.
  * config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates
  CPUs with high branch cost.
 
  diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
  index 6337aa5..5ac10eb 100644
  --- a/gcc/config/i386/i386.c
  +++ b/gcc/config/i386/i386.c
  @@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p,
   opts-x_param_values,
   opts_set-x_param_values);
 
  +  /* Extend full peel max insns parameter for CPUs with high branch cost.  
  */
  +  if (TARGET_HIGH_BRANCH_COST)
  +maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
  +  120,
  +  opts-x_param_values,
  +  opts_set-x_param_values);
  +
  +
 /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
 if (opts-x_flag_prefetch_loop_arrays  0
  HAVE_prefetch
  diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
  index 2c64162..da0c57b 100644
  --- a/gcc/config/i386/i386.h
  +++ b/gcc/config/i386/i386.h
  @@ -415,6 +415,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
   #define TARGET_INTER_UNIT_CONVERSIONS \
  ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS]
   #define TARGET_FOUR_JUMP_LIMIT ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT]
  +#define TARGET_HIGH_BRANCH_COST
  ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST]
   #define TARGET_SCHEDULE
  ix86_tune_features[X86_TUNE_SCHEDULE]
   #define TARGET_USE_BT  ix86_tune_features[X86_TUNE_USE_BT]
   #define TARGET_USE_INCDEC  ix86_tune_features[X86_TUNE_USE_INCDEC]
  diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
  index b6b210e..04d8bf8 100644
  --- a/gcc/config/i386/x86-tune.def
  +++ b/gcc/config/i386/x86-tune.def
  @@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit,
 m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL |
m_ATHLON_K8 | m_AMDFAM10)
 
  +/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost.  This 
  could be
  +   used to tune unroll, if-cvt, inline... heuristics.  */
  +DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, high_branch_cost,
  +  m_BONNELL | m_SILVERMONT | m_INTEL)
  +
   
  /*/
   /* Integer instruction selection tuning
*/
   
  /*/