[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #32 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #30)
> I didn't close it because I wanted to see updated benchmark numbers.  Either
> I'll grab the benchmark, or if somebody else posts the latest numbers, we
> can close it or keep open depending on that.

gcc7 -O3:LU  Mflops:  5444.74
gcc7 -Ofast: LU  Mflops:  5385.51
gcc6 -O3:LU  Mflops:  5515.91
gcc6 -Ofast: LU  Mflops:  5487.94

so there's a <2% regression remaining (noise level is ~0.5%).

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #31 from Richard Biener  ---
Yep, looks fixed on the tester.

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #30 from Jakub Jelinek  ---
I didn't close it because I wanted to see updated benchmark numbers.  Either
I'll grab the benchmark, or if somebody else posts the latest numbers, we can
close it or keep open depending on that.

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-12 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at redhat dot com

--- Comment #29 from Jeffrey A. Law  ---
Jakub's fix addresses the last remaining issue IIUC.  Should we close this out?

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #28 from Jakub Jelinek  ---
Author: jakub
Date: Wed Apr 12 18:09:47 2017
New Revision: 246882

URL: https://gcc.gnu.org/viewcvs?rev=246882=gcc=rev
Log:
PR tree-optimization/79390
* optabs.c (emit_conditional_move): If the preferred op2/op3 operand
order does not result in usable sequence, retry with reversed operand
order.

* gcc.target/i386/pr70465-2.c: Xfail the scan-assembler-not test.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/optabs.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/i386/pr70465-2.c

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-12 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #27 from rguenther at suse dot de  ---
On Wed, 12 Apr 2017, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390
> 
> --- Comment #26 from Jakub Jelinek  ---
> Created attachment 41189
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41189=edit
> gcc7-pr79390-ecm.patch
> 
> Untested fix for the -O3 -ffast-math -march=haswell case.
> The difference between -fno-fast-math and -ffast-math is in:
>   if (swap_commutative_operands_p (op2, op3)
>   && ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
>   != UNKNOWN))
> {
>   std::swap (op2, op3);
>   code = reversed;
> }
> 
> swap_commutative_operands_p is true in both cases, but without
> -ffast-math reversed_comparison_code_parts fails (returns UNKNOWN), so we 
> don't
> try that order and succeed, while with -ffast-math it doesn't fail, returns 
> LE,
> but we reject it in the predicates of the cmov insn and thus don't emit
> anything.  This patch just retries with the other order of operands in that
> case.

Looks sensible.

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #26 from Jakub Jelinek  ---
Created attachment 41189
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41189=edit
gcc7-pr79390-ecm.patch

Untested fix for the -O3 -ffast-math -march=haswell case.
The difference between -fno-fast-math and -ffast-math is in:
  if (swap_commutative_operands_p (op2, op3)
  && ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
  != UNKNOWN))
{
  std::swap (op2, op3);
  code = reversed;
}

swap_commutative_operands_p is true in both cases, but without
-ffast-math reversed_comparison_code_parts fails (returns UNKNOWN), so we don't
try that order and succeed, while with -ffast-math it doesn't fail, returns LE,
but we reject it in the predicates of the cmov insn and thus don't emit
anything.  This patch just retries with the other order of operands in that
case.

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

Richard Biener  changed:

   What|Removed |Added

   Priority|P1  |P2

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #25 from Richard Biener  ---
So the original report is fixed (-O3 -march-native).  But adding -ffast-math
still ends up regressing.

At this point it's probably appropriate to re-target to GCC 8.

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #24 from Richard Biener  ---
Author: rguenth
Date: Wed Apr 12 09:41:02 2017
New Revision: 246869

URL: https://gcc.gnu.org/viewcvs?rev=246869=gcc=rev
Log:
2017-04-12  Richard Biener  

PR tree-optimization/79390
* gimple-ssa-split-paths.c (is_feasible_trace): Restrict
threading case even more.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimple-ssa-split-paths.c

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #23 from Richard Biener  ---
(In reply to Richard Biener from comment #22)
> (In reply to rguent...@suse.de from comment #21)
> > On April 7, 2017 6:57:13 PM GMT+02:00, "jakub at gcc dot gnu.org"
> >  wrote:
> > >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390
> > >
> > >--- Comment #20 from Jakub Jelinek  ---
> > >So, Richard, any thoughts on what can be done split paths to avoid
> > >this?
> > 
> > Invent some new heuristic that avoids splitting this case...
> 
> Index: gcc/gimple-ssa-split-paths.c
> ===
> --- gcc/gimple-ssa-split-paths.c(revision 246803)
> +++ gcc/gimple-ssa-split-paths.c(working copy)
> @@ -249,13 +249,17 @@ is_feasible_trace (basic_block bb)
>   imm_use_iterator iter2;
>   FOR_EACH_IMM_USE_FAST (use2_p, iter2, gimple_phi_result
> (stmt))
> {
> - if (is_gimple_debug (USE_STMT (use2_p)))
> + gimple *use_stmt = USE_STMT (use2_p);
> + if (is_gimple_debug (use_stmt))
> continue;
> - basic_block use_bb = gimple_bb (USE_STMT (use2_p));
> + basic_block use_bb = gimple_bb (use_stmt);
>   if (use_bb != bb
>   && dominated_by_p (CDI_DOMINATORS, bb, use_bb))
> {
> - found_useful_phi = true;
> + if (gcond *cond = dyn_cast  (use_stmt))
> +   if (gimple_cond_code (cond) == EQ_EXPR
> +   || gimple_cond_code (cond) == NE_EXPR)
> + found_useful_phi = true;
>   break;
> }
> }
> 
> avoids the splitting at at least passes tree-ssa.exp testing.  Throwing it
> on full testing (there are some path splitting testcases randomly placed
> IIRC).

Bootstrap / regtest went ok.  With this and -O3 -march=native (on a broadwell
CPU) I get

gcc6 -O3 -march=native: 5469.25 Mflops
gcc7 -O3 -march=native: 5439.39 Mflops

but note that with -Ofast -march=native the situation is still bad
(-fno-split-paths doesn't help but -ftree-loop-if-convert does):

gcc6 -Ofast -march=native: 5500.51 Mflops
gcc7 -Ofast -march=native: 4765.56 Mflops
gcc7 -Ofast -march=native -ftree-loop-if-convert: 5335.49 Mflops

Shall I go for the split-path fix for the moment?

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #22 from Richard Biener  ---
(In reply to rguent...@suse.de from comment #21)
> On April 7, 2017 6:57:13 PM GMT+02:00, "jakub at gcc dot gnu.org"
>  wrote:
> >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390
> >
> >--- Comment #20 from Jakub Jelinek  ---
> >So, Richard, any thoughts on what can be done split paths to avoid
> >this?
> 
> Invent some new heuristic that avoids splitting this case...

Index: gcc/gimple-ssa-split-paths.c
===
--- gcc/gimple-ssa-split-paths.c(revision 246803)
+++ gcc/gimple-ssa-split-paths.c(working copy)
@@ -249,13 +249,17 @@ is_feasible_trace (basic_block bb)
  imm_use_iterator iter2;
  FOR_EACH_IMM_USE_FAST (use2_p, iter2, gimple_phi_result
(stmt))
{
- if (is_gimple_debug (USE_STMT (use2_p)))
+ gimple *use_stmt = USE_STMT (use2_p);
+ if (is_gimple_debug (use_stmt))
continue;
- basic_block use_bb = gimple_bb (USE_STMT (use2_p));
+ basic_block use_bb = gimple_bb (use_stmt);
  if (use_bb != bb
  && dominated_by_p (CDI_DOMINATORS, bb, use_bb))
{
- found_useful_phi = true;
+ if (gcond *cond = dyn_cast  (use_stmt))
+   if (gimple_cond_code (cond) == EQ_EXPR
+   || gimple_cond_code (cond) == NE_EXPR)
+ found_useful_phi = true;
  break;
}
}

avoids the splitting at at least passes tree-ssa.exp testing.  Throwing it
on full testing (there are some path splitting testcases randomly placed
IIRC).

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-07 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #21 from rguenther at suse dot de  ---
On April 7, 2017 6:57:13 PM GMT+02:00, "jakub at gcc dot gnu.org"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390
>
>--- Comment #20 from Jakub Jelinek  ---
>So, Richard, any thoughts on what can be done split paths to avoid
>this?

Invent some new heuristic that avoids splitting this case...

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #20 from Jakub Jelinek  ---
So, Richard, any thoughts on what can be done split paths to avoid this?

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-07 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #19 from vincenzo Innocente  ---
Could you please have a look also to c++ and lto: this is what I get on my
skylake:
for c++ or lto -fno-split-paths pessimizes
[innocent@vinavx3 scimark2TMP]$ gcc -march=native -Wall -Ofast *.c -lm ;
./a.out | grep LU
LU  Mflops:  5920.14(M=100, N=100)
[innocent@vinavx3 scimark2TMP]$ gcc -march=native -Wall -Ofast *.c -lm
-fno-split-paths ; ./a.out | grep LU
LU  Mflops:  6136.33(M=100, N=100)
[innocent@vinavx3 scimark2TMP]$ gcc -march=native -Wall -Ofast *.c -lm -flto ;
./a.out | grep LU
LU  Mflops:  5809.93(M=100, N=100)
[innocent@vinavx3 scimark2TMP]$ gcc -march=native -Wall -Ofast *.c -lm -flto
-fno-split-paths ; ./a.out | grep LU
LU  Mflops:  5630.24(M=100, N=100)
[innocent@vinavx3 scimark2TMP]$ c++ -march=native -Wall -Ofast *.c -lm ;
./a.out | grep LU
LU  Mflops:  6001.47(M=100, N=100)
[innocent@vinavx3 scimark2TMP]$ c++ -march=native -Wall -Ofast *.c -lm
-fno-split-paths ; ./a.out | grep LU
LU  Mflops:  5920.14(M=100, N=100)
[innocent@vinavx3 scimark2TMP]$ c++ -march=native -Wall -Ofast *.c -lm -flto;
./a.out | grep LU
LU  Mflops:  5434.16(M=100, N=100)
[innocent@vinavx3 scimark2TMP]$ c++ -march=native -Wall -Ofast *.c -lm -flto
-fno-split-paths ; ./a.out | grep LU
LU  Mflops:  5434.16(M=100, N=100)

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-07 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #18 from rguenther at suse dot de  ---
On Fri, 7 Apr 2017, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390
> 
> --- Comment #16 from Jakub Jelinek  ---
> Has somebody the benchmark around to retry with current trunk, with
> -f{,no-}split-paths and compare that to some older trunk and gcc6?

On a broadwell machine I get (-O3 -march=native)

gcc6: 5507.42 Mflops
gcc7: 4787.26 Mflops
gcc7: 5435.08 Mflops [-fno-split-paths]

so the RTL if-conversion works now unless inhibited by path splitting.

What path splitting does is mostly undone by loop disambiguation which
re-creates the merger so path splitting just made the loop multiple
exit (without simplifying the duplicated exit condition).

So we can add more heuristics to tame down loop splitting, for example
never duplicating a joiner that has an exit.  Or adding to the
quite stupid if-cvt mitigation code (missing the minmax case).

Or add even more outs to the threading opportunity detection code...
We currently find that

  t_175 = PHI 

in the merger exposes a threading opportunity because it has one
arg that is unchanged over the latch (t_184 over 6->8) and it has
a use in the threading destination (in the controlling condition
even).

This all just exposes that path splitting is not well integrated
into what it tries to expose (threading).  IMHO it should have been
part of backwards/forward threading.

But that ship has sailed (Jeff approved it).

I've tried to fixup after the MIA authors.  But well.

I can fixup by removing the pass again.  Or adding more oddball
heuristics.  This case which seems important for x86_64 is

for (i=j+1; i t)
{
jp = i;
t = ab;
}
}

so reducing MAX plus remembering the index of the maximum value.
We're not phiopt-ing that to MAX because it might not be profitable
(the condition has to remain).  So path splitting could be profitable
on some archs.  IFF we wouldn't re-create that shared latch
right afterwards anyway (and forget to propagate single-arg PHIs
resulting from the BB duplication).

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-07 Thread vincenzo.innocente at cern dot ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #17 from vincenzo Innocente  ---
[innocent@vinavx3 innocent]$ mkdir scimark2TMP
[innocent@vinavx3 innocent]$ cd scimark2TMP
[innocent@vinavx3 scimark2TMP]$ wget
http://math.nist.gov/scimark2/scimark2_1c.zip .
.
gcc version 7.0.1 20170407 (experimental) [trunk revision 246752] (GCC) 
[innocent@vinavx3 scimark2TMP]$ gcc -Ofast -march=haswell *.c -lm
[innocent@vinavx3 scimark2TMP]$ ./a.out 
**  **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to p...@nist.gov) **
**  **
Using   2.00 seconds min time per kenel.
Composite Score: 2783.60
FFT Mflops:  2325.65(N=1024)
SOR Mflops:  2260.36(100 x 100)
MonteCarlo: Mflops:   829.14
Sparse matmult  Mflops:  2582.70(N=1000, nz=5000)
LU  Mflops:  5920.14(M=100, N=100)
[innocent@vinavx3 scimark2TMP]$ gcc -Ofast -march=haswell *.c -lm
-fno-split-paths 
[innocent@vinavx3 scimark2TMP]$ ./a.out
**  **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to p...@nist.gov) **
**  **
Using   2.00 seconds min time per kenel.
Composite Score: 2825.86
FFT Mflops:  2333.43(N=1024)
SOR Mflops:  2260.36(100 x 100)
MonteCarlo: Mflops:   829.14
Sparse matmult  Mflops:  2570.04(N=1000, nz=5000)
LU  Mflops:  6136.33(M=100, N=100)
[innocent@vinavx3 scimark2TMP]$ gcc -Ofast -march=haswell *.c -lm -fsplit-paths
[innocent@vinavx3 scimark2TMP]$ ./a.out
**  **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to p...@nist.gov) **
**  **
Using   2.00 seconds min time per kenel.
Composite Score: 2787.46
FFT Mflops:  2325.65(N=1024)
SOR Mflops:  2260.36(100 x 100)
MonteCarlo: Mflops:   832.36
Sparse matmult  Mflops:  2582.70(N=1000, nz=5000)
LU  Mflops:  5936.23(M=100, N=100)
[innocent@vinavx3 scimark2TMP]$ pushd ~/code/s7/C
CMSSW_8_0_22/ CMSSW_9_1_0_pre2/ 
[innocent@vinavx3 scimark2TMP]$ pushd ~/code/s7/CMSSW_9_1_0_pre2/
~/code/s7/CMSSW_9_1_0_pre2 /tmp/innocent/scimark2TMP 
[innocent@vinavx3 CMSSW_9_1_0_pre2]$ cmsenv
[innocent@vinavx3 CMSSW_9_1_0_pre2]$ popd
/tmp/innocent/scimark2TMP 
[innocent@vinavx3 scimark2TMP]$ gcc -v
gcc version 6.3.0 (GCC) 
[innocent@vinavx3 scimark2TMP]$ gcc -Ofast -march=haswell *.c -lm 
[innocent@vinavx3 scimark2TMP]$ ./a.out
**  **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to p...@nist.gov) **
**  **
Using   2.00 seconds min time per kenel.
Composite Score: 2820.21
FFT Mflops:  2325.65(N=1024)
SOR Mflops:  2260.36(100 x 100)
MonteCarlo: Mflops:   810.37
Sparse matmult  Mflops:  2427.26(N=1000, nz=5000)
LU  Mflops:  6277.39(M=100, N=100)

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-06 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #16 from Jakub Jelinek  ---
Has somebody the benchmark around to retry with current trunk, with
-f{,no-}split-paths and compare that to some older trunk and gcc6?

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-06 Thread ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #15 from Rainer Orth  ---
Author: ro
Date: Thu Apr  6 13:11:21 2017
New Revision: 246729

URL: https://gcc.gnu.org/viewcvs?rev=246729=gcc=rev
Log:
Fix gcc.target/i386/pr79390.c for Solaris as

PR tree-optimization/79390
* gcc.target/i386/pr79390.c: Allow for cmovl.a.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/i386/pr79390.c

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #14 from Jakub Jelinek  ---
Author: jakub
Date: Tue Apr  4 17:52:27 2017
New Revision: 246686

URL: https://gcc.gnu.org/viewcvs?rev=246686=gcc=rev
Log:
PR tree-optimization/79390
* target.h (struct noce_if_info): Declare.
* targhooks.h (default_noce_conversion_profitable_p): Declare.
* target.def (noce_conversion_profitable_p): New target hook.
* ifcvt.h (struct noce_if_info): New type, moved from ...
* ifcvt.c (struct noce_if_info): ... here.
(noce_conversion_profitable_p): Renamed to ...
(default_noce_conversion_profitable_p): ... this.  No longer
static nor inline.
(noce_try_store_flag_constants, noce_try_addcc,
noce_try_store_flag_mask, noce_try_cmove, noce_try_cmove_arith,
noce_convert_multiple_sets): Use targetm.noce_conversion_profitable_p
instead of noce_conversion_profitable_p.
* config/i386/i386.c: Include ifcvt.h.
(ix86_option_override_internal): Don't override
PARAM_MAX_RTL_IF_CONVERSION_INSNS default.
(ix86_noce_conversion_profitable_p): New function.
(TARGET_NOCE_CONVERSION_PROFITABLE_P): Redefine.
* config/i386/x86-tune.def (X86_TUNE_ONE_IF_CONV_INSN): Adjust comment.
* doc/tm.texi.in (TARGET_NOCE_CONVERSION_PROFITABLE_P): Add.
* doc/tm.texi: Regenerated.

* gcc.target/i386/pr79390.c: New test.
* gcc.dg/ifcvt-4.c: Use -mtune-ctrl=^one_if_conv_insn for i?86/x86_64.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr79390.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/config/i386/x86-tune.def
trunk/gcc/doc/tm.texi
trunk/gcc/doc/tm.texi.in
trunk/gcc/ifcvt.c
trunk/gcc/ifcvt.h
trunk/gcc/target.def
trunk/gcc/target.h
trunk/gcc/targhooks.h
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/ifcvt-4.c

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-04-01 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

--- Comment #13 from Jakub Jelinek  ---
Created attachment 41097
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41097=edit
gcc7-pr79390.patch

Untested fix.

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-03-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug tree-optimization/79390] [7 Regression] 10% performance drop in SciMark2 LU after r242550

2017-03-28 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390

Richard Biener  changed:

   What|Removed |Added

  Known to work||6.3.1
   Target Milestone|--- |7.0
Summary|10% performance drop in |[7 Regression] 10%
   |SciMark2 LU after r242550   |performance drop in
   ||SciMark2 LU after r242550

--- Comment #12 from Richard Biener  ---
On more recent trunk -fno-split-paths makes only a tiny difference (4882 vs.
4779 Mflops) while -ftree-loop-if-convert still results in 5432 Mflops.  GCC 6
scores 5523 Mflops for me (-O3 -march=native on a Broadwell CPU).

Marking as regression.