[Bug rtl-optimization/68898] ICE if rtl if-conversion is off.

2015-12-14 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68898 --- Comment #2 from Yuri Rumyantsev --- Forgot to add stack trace: Error: dominator of 6 status unknown t2.f:41:0: internal compiler error: Segmentation fault 0xb4e62f crash_signal

[Bug rtl-optimization/68898] ICE if rtl if-conversion is off.

2015-12-14 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68898 --- Comment #1 from Yuri Rumyantsev --- Created attachment 37028 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37028=edit test-case to reproduce Need to compile with -O2 -m32 -ffast-math options to reproduce. Note that 32-bit and

[Bug rtl-optimization/68898] New: ICE if rtl if-conversion is off.

2015-12-14 Thread ysrumyan at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- I tried to play with if-conversion flag and got ICE on all benchspec2 from spec2000 suite. I attach simple Fortran reproducer. Note that "-fno-if-conversion2" option doe

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-12-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #23 from Yuri Rumyantsev --- Richard, Do we have any chance to vectorize attached test-case using GCC6 compiler?

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-12-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #24 from Yuri Rumyantsev --- Richard, Do we have any chance to vectorize attached test-case using GCC6 compiler?

[Bug middle-end/68542] [6 Regression] 10% 481.wrf performance regression

2015-11-26 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68542 --- Comment #3 from Yuri Rumyantsev --- I enhanced a patch for masked stores movement by guard on zero mask - move all possible producers for stored value and performance degradation disappeared. the patch will be re-designed and send for review

[Bug middle-end/67438] [6 Regression] ~X op ~Y pattern relocation causes loop performance degradation on 32bit x86

2015-11-24 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67438 --- Comment #11 from Yuri Rumyantsev --- In fact, the problem is quite different although it is caused by non-profitable pattern matching ~X CMP ~Y -> Y CMP X. In general this pattern may be helpful if we can delete not operation, e.g. x1 =

[Bug rtl-optimization/68435] [6 Regression] Missed if-conversion optimization

2015-11-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68435 --- Comment #6 from Yuri Rumyantsev --- It turned out that fresh gcc performs tail duplication (aka path splitting) preventing if-conversion. So I post a dump for 20150929 compiler which reproduces the issue.

[Bug rtl-optimization/68435] [6 Regression] Missed if-conversion optimization

2015-11-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68435 --- Comment #7 from Yuri Rumyantsev --- Created attachment 36780 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36780=edit rtl-ce1 dump file The dump is for 20150929 compiler

[Bug rtl-optimization/68435] [6 Regression] Missed if-conversion optimization

2015-11-19 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68435 --- Comment #4 from Yuri Rumyantsev --- Created attachment 36774 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36774=edit tar file tar file contains good and bad ce1-rtl dumps showing the problem

[Bug rtl-optimization/68435] [6 Regression] Missed if-conversion optimization

2015-11-19 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68435 --- Comment #2 from Yuri Rumyantsev --- I will post 2 rtl dumps for ce1 phase produced with -O2 -m32 options on ix86. You can see that file t21.c.203r.ce1 produced by 20110927 compiler contains 3 possible IF blocks searched. 1 IF blocks

[Bug middle-end/67438] [6 Regression] ~X op ~Y pattern relocation causes loop performance degradation on 32bit x86

2015-11-17 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67438 --- Comment #9 from Yuri Rumyantsev --- It looks like such transformation is profitable if only def statements have a single use, i.e. it looks reasonable for if (255 - a) > (255 -b) /* a,b have char type. */ but it does not look reasonable

[Bug tree-optimization/68021] [6 Regression] ice in rewrite_use_nonlinear_expr with -O3

2015-10-21 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68021 --- Comment #4 from Yuri Rumyantsev --- Indeed, there is an issue with outer-loop unswitching - it should not be performed for infinite loops. But if we slightly modify test if finite outer-loop we will get the same error: char a; void fn1(char

[Bug tree-optimization/68021] [6 Regression] ice in rewrite_use_nonlinear_expr with -O3

2015-10-21 Thread ysrumyan at gmail dot com
/bugzilla/show_bug.cgi?id=68021 > > H.J. Lu changed: > >What|Removed |Added > > CC| |ysrumyan at gmail dot com > >

[Bug tree-optimization/68021] [6 Regression] ice in rewrite_use_nonlinear_expr with -O3

2015-10-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68021 --- Comment #3 from Yuri Rumyantsev --- It looks like unswitching of outer loops pass simply triggers the issue and this tree-ssa-loop-ivopts issue.

[Bug tree-optimization/67947] [6 Regression] wrong code at -O3 on x86_64-linux-gnu

2015-10-13 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67947 --- Comment #2 from Yuri Rumyantsev --- revision 228760 must fix this bug.

[Bug tree-optimization/67909] [6 Regression] 416.gamess in SPEC CPU 2006 is miscompiled

2015-10-13 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67909 --- Comment #4 from Yuri Rumyantsev --- Created attachment 36498 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36498=edit proposed patch This patch cures run-time error for 416.gamess.

[Bug tree-optimization/67920] [6 Regression] wrong code with -O3

2015-10-13 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67920 --- Comment #8 from Yuri Rumyantsev --- Please check that revision 228760 will cure your issue.

[Bug tree-optimization/67909] [6 Regression] 416.gamess in SPEC CPU 2006 is miscompiled

2015-10-13 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67909 --- Comment #3 from Yuri Rumyantsev --- Check that guard edge is around the inner loop was missed. After adding it 416.gamess run successfully. I sent the fix for review.

[Bug rtl-optimization/67206] New: Redundant spills in simple copy loop for 32-bit x86 target

2015-08-13 Thread ysrumyan at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- For attached simple test-case we can see strange spills to stack, namely for (i=0; in; i++) out[j * n + i] = in[j * n + i

[Bug rtl-optimization/67206] Redundant spills in simple copy loop for 32-bit x86 target

2015-08-13 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67206 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 36180 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36180action=edit test-case to reproduce Must be compiled with -O3 -m32 -march=slm to reproduce.

[Bug target/56309] conditional moves instead of compare and branch result in almost 2x slower code

2015-08-06 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309 --- Comment #34 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 36138 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36138action=edit simple reproducer Use -O3 -std=c++14 options to compile and -fno-tree-loop

[Bug target/56309] conditional moves instead of compare and branch result in almost 2x slower code

2015-08-06 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309 --- Comment #33 from Yuri Rumyantsev ysrumyan at gmail dot com --- With current compiler there is not performance difference for by-ref and by-val test-cases, but if we turn off if-convert transformation we will get ~2X speed-up: on Intel(R) Xeon

[Bug tree-optimization/66951] [6 Regression] ICE at -O3 on x86_64-linux-gnu, verify_ssa failed

2015-07-21 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66951 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug tree-optimization/66926] [6 regression] FAIL: gfortran.dg/graphite/vect-pr40979.f90 -O (internal compiler error)

2015-07-21 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66926 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- I have a fix in my local area which cures ICE and perform outer-loop vectorization: vect-pr40979.f90:8:0: note: LOOP VECTORIZED vect-pr40979.f90:8:0: note: OUTER LOOP VECTORIZED

[Bug tree-optimization/66926] [6 regression] FAIL: gfortran.dg/graphite/vect-pr40979.f90 -O (internal compiler error)

2015-07-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66926 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- Could somebody provides me with an instruction how to build trunk (fresh) compiler with graphite? Thanks.

[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG

2015-07-10 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG

2015-07-10 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35947 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35947action=edit test-case to reproduce compile with -Ofast -m32 -march=slm and notice redundant test

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-26 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #13 from Yuri Rumyantsev ysrumyan at gmail dot com --- Original test-case is not vectorized yet with Richard patch for sccvn.

[Bug tree-optimization/66142] New: Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-14 Thread ysrumyan at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- The attached test-case compiled with -Ofast -fopenmp -march=core-avx2 options contains loop marked with pragma omp

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2015-05-14 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35541 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35541action=edit test-case to reproduce Must be compiled with -Ofast -fopenmp -march=core-avx2 options.

[Bug target/64691] Suboptimal register allocation for bytes comparison on i386

2015-05-12 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64691 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35526 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35526action=edit tset-case to reproduce and assembly file.

[Bug target/64691] Suboptimal register allocation for bytes comparison on i386

2015-05-12 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64691 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug lto/65950] Loop is not vectorized with lto.

2015-05-05 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65950 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- The function containing given loop is marked as: foo/24 (foo) @0x7f39f4b84620 Type: function definition analyzed Visibility: prevailing_def_ironly References: Referring

[Bug lto/65950] Loop is not vectorized with lto.

2015-04-30 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65950 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35432 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35432action=edit test-case to reproduce Must be compiled with -Ofast and -fopenmp options.

[Bug lto/65950] New: Loop is not vectorized with lto.

2015-04-30 Thread ysrumyan at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- If we compile attached test-case without lto, e.g. using -Ofast and -fopenmp loop in foo is vectorized but if we add -flto option it won't be vectorized. The problem is 'exit' statement

[Bug rtl-optimization/65698] Non-optimal code for simple compare function for x86 32-bit target

2015-04-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65698 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35257 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35257action=edit assembly for test.c Additional option '-march=slm' was used for it but it is non

[Bug rtl-optimization/65698] New: Non-optimal code for simple compare function for x86 32-bit target

2015-04-08 Thread ysrumyan at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com For attached test-case in inner loop we can see the following deficiencies: 1. 2 redundant fills and one spill in comparison part of loop - I assume

[Bug rtl-optimization/65698] Non-optimal code for simple compare function for x86 32-bit target

2015-04-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65698 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35256 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35256action=edit test-case to reproduce It needs to be compiled with -O3 -m32 options.

[Bug rtl-optimization/65651] Redundant cmp with zero instruction in loop for x86 target.

2015-04-01 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65651 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35203 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35203action=edit test-case to reproduce

[Bug rtl-optimization/65651] Redundant cmp with zero instruction in loop for x86 target.

2015-04-01 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65651 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35202 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35202action=edit test-case to reproduce Need to compile with -O2 flag only.

[Bug rtl-optimization/65651] New: Redundant cmp with zero instruction in loop for x86 target.

2015-04-01 Thread ysrumyan at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Compile attached bad.c with -O2 option only we can see that redundant cmp with zero instruction is generated: subl%r9d, %eax cmpl$0, %eax

[Bug rtl-optimization/65651] Redundant cmp with zero instruction in loop for x86 target.

2015-04-01 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65651 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- Jakub, Thanks for your comments. We will try to fix this issue ourselves. Best regards. Yuri. P.S. Note that icc does not produce such redundant cmp with zero. 2015-04-01 16

[Bug tree-optimization/65494] New: [5.0 Regression] Loop is not vectorized because of operand canonicalization.

2015-03-20 Thread ysrumyan at gmail dot com
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com 5.0 compiler is not vectorized simple loop extracted from geekbench but 4.9 compiler does. This is caused by different operand ordering

[Bug tree-optimization/65494] [5.0 Regression] Loop is not vectorized because of operand canonicalization.

2015-03-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65494 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35072 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35072action=edit test-case to reproduce The following options are used to reproduce: -Ofast -funroll

[Bug tree-optimization/65206] New: Vectorized version of loop is removed.

2015-02-25 Thread ysrumyan at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com I noticed that vectorized version of loop is deleted although compiler reports that it was successfully vectorized: t1.c:7:3: note: LOOP VECTORIZED but after we can see in vect-dump: Removing

[Bug tree-optimization/65206] Vectorized version of loop is removed.

2015-02-25 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34867 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34867action=edit test-case to reproduce Test needs to be compiled with -Ofast -m64 -mcore-avx2 options.

[Bug target/65161] ICE: in vec_haifa_insn_data, va_heap, vl_embed::operator[], at vec.h:736 with -O3 -fselective-scheduling2 -mtune=slm

2015-02-24 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65161 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug target/65161] ICE: in vec_haifa_insn_data, va_heap, vl_embed::operator[], at vec.h:736 with -O3 -fselective-scheduling2 -mtune=slm

2015-02-24 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65161 --- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34856 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34856action=edit possible patch Add check on selective scheduling to not perform instruction

[Bug rtl-optimization/65135] New: Performance regression in pic mode after r220674.

2015-02-20 Thread ysrumyan at gmail dot com
: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com We noticed 10% regression on one important benchmark using for testing x86 32-bit platforms. This regression can be reproduced on attached test-case: one more fill is present

[Bug rtl-optimization/65135] Performance regression in pic mode after r220674.

2015-02-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65135 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34814 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34814action=edit test-case to reproduce Need to compile with -O2 -m32 -fPIE -pie options.

[Bug rtl-optimization/65135] [5 Regression] Performance regression in pic mode after r220674.

2015-02-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65135 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- This patch improves performance of almost all benchmarks in pic-mode for 32-bit target, but we have the only huge degradation on benchmark from eembc1.1 suite. I mentioned

[Bug rtl-optimization/65078] [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2

2015-02-16 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34782 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34782action=edit test-case to reproduce Options -m32 -msse2 -O3 must be used.

[Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2

2015-02-16 Thread ysrumyan at gmail dot com
: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Using attached simple test-case extracted from codec we found out that 4.8.2 compiler generates more compact binaries in comparison

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand canonicalization (r216728).

2015-02-09 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #19 from Yuri Rumyantsev ysrumyan at gmail dot com --- Andrew! Could you please try modified test-case (test1.c) which is attached. Thanks.

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand canonicalization (r216728).

2015-02-09 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #20 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34700 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34700action=edit another test-case

[Bug middle-end/64809] [5 Regression] ICE at -O3 with -g enabled on x86_64-linux-gnu (in 32-bit mode)

2015-01-27 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64809 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug tree-optimization/64746] Loop with nested load/stores is not vectorized using aggressive if-conversion.

2015-01-23 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64746 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34551 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34551action=edit proposed patch Patch to cure vectorization issue.

[Bug tree-optimization/64746] New: Loop with nested load/stores is not vectorized using aggressive if-conversion.

2015-01-23 Thread ysrumyan at gmail dot com
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Attached simple test-case extracted from important suite is not vectorized even if 'pragma omp simd' is used since

[Bug tree-optimization/64746] Loop with nested load/stores is not vectorized using aggressive if-conversion.

2015-01-23 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64746 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34548 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34548action=edit test-case to reproduce. Need to compile this test on x86 with option -O3 -fopenmp

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand canonicalization (r216728).

2014-12-30 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #11 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34363 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34363action=edit patch to fix issue This patch fixed almost all issues related to operand

[Bug tree-optimization/64434] New: Performance regression after operand canonicalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com We noticed huge regression on eembc1.1 and eembc2.0 for 32-bit target at x86. It can be reproduced on attached test-case: before this fix number

[Bug tree-optimization/64434] Performance regression after operand canonicalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34345 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34345action=edit simple reproducer Need to compile with -m32 on x86 platform.

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand cannibalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com --- I put into attachment two assembly files for test-case compiled with -O2 -m32 -S options.

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand cannibalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34348 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34348action=edit assembly files for test.c Assembly file fro test.c

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand cannibalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 34349 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34349action=edit assembly file before r216728 Assembly file.

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand cannibalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #6 from Yuri Rumyantsev ysrumyan at gmail dot com --- H.J. I put before/after assembly files into bug attachment. We saw slowdown on SLM and HSW for 32-bit on eembc2.0, e.g. des degradated on 36% (SLM) and 7%(HSW). But we did not see

[Bug tree-optimization/64434] [5 Regression] Performance regression after operand cannibalization (r216728).

2014-12-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64434 --- Comment #8 from Yuri Rumyantsev ysrumyan at gmail dot com --- The issue is caused by operand canonicalization, i.e. there is special operand odering for commutative operations to have the same representation for a + b and b

[Bug tree-optimization/63743] Thumb1: big regression for float operators by r216728

2014-12-26 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63743 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug tree-optimization/63941] [5 Regression] ICE on valid code at -O3 and above on x86_64-linux-gnu

2014-11-28 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63941 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- My patch is responsible for ICE - I did not assume that before if-convert phase cfg may contain redundant degenerative conditional branches: bb 4: ... _14 = d[pretmp_51

[Bug other/61391] [5 Regression] ICE in execute_one_pass at -O3 and above

2014-11-07 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61391 --- Comment #6 from Yuri Rumyantsev ysrumyan at gmail dot com --- Arseny, I am not able to close this bug but you can do it.

[Bug tree-optimization/61743] [5 Regression] Complete unroll is not happened for loops with short upper bound

2014-10-24 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743 --- Comment #12 from Yuri Rumyantsev ysrumyan at gmail dot com --- Richard, Did you have a chance to look at this and prepare more general fix? Thanks. Yuri. 2014-09-08 15:13 GMT+04:00 rguenther at suse dot de gcc-bugzi...@gcc.gnu.org: https

[Bug tree-optimization/62012] Loop is not vectorized after function inlining (SCEV)

2014-09-09 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62012 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- I checked that our benchmark is successfully vectorized with function inlining. So this bug must be closed as fixed/resolved.

[Bug tree-optimization/62012] Loop is not vectorized after function inlining (SCEV)

2014-09-09 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62012 --- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com --- You can close this bug as fixed/resolved (see my comment). Thanks. Yuri. 2014-09-08 15:29 GMT+04:00 rguenth at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org: https://gcc.gnu.org

[Bug tree-optimization/61743] [5 Regression] Complete unroll is not happened for loops with short upper bound

2014-09-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743 --- Comment #10 from Yuri Rumyantsev ysrumyan at gmail dot com --- Richard, Do you have any progress? Thanks. 2014-08-13 12:35 GMT+04:00 rguenth at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743

[Bug tree-optimization/62012] Loop is not vectorized after function inlining (SCEV)

2014-09-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62012 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- Any updates? Thanks.

[Bug target/62011] False Data Dependency in popcnt instruction

2014-08-12 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011 --- Comment #7 from Yuri Rumyantsev ysrumyan at gmail dot com --- Please ignore my previous comment - if we insert nullifying of destination register before each popcnt (and lzcnt) performance will restore: original test results: unsigned

[Bug target/62011] False Data Dependency in popcnt instruction

2014-08-12 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011 --- Comment #9 from Yuri Rumyantsev ysrumyan at gmail dot com --- This is not u32 version but u64. The first loop (u32) version looks like: .L23: leal1(%rdx), %ecx xorq%rax, %rax popcntq(%rbx,%rax,8), %rax leal

[Bug target/62011] False Data Dependency in popcnt instruction

2014-08-11 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

[Bug tree-optimization/61743] [4.10 Regression] Complete unroll is not happened for loops with short upper bound

2014-08-11 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743 --- Comment #8 from Yuri Rumyantsev ysrumyan at gmail dot com --- Richard, I tested both proposed fixes and i turned out that the first one is preferable since performance of benchmark came back. Note that hoisting 2nd vrp pass gave us another

[Bug tree-optimization/61743] [4.10 Regression] Complete unroll is not happened for loops with short upper bound

2014-08-07 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743 --- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com --- Any comments will be appreciated.

[Bug tree-optimization/62021] New: ICE in verify_gimple_assign_single

2014-08-05 Thread ysrumyan at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com For attached simple test-case if we omit 'uniform' specification compiler produces ICE: error: incorrect type of vector CONSTRUCTOR elements Note that for stmt _38 = {vect_cst_.62_39, vect_cst_

[Bug tree-optimization/62021] ICE in verify_gimple_assign_single

2014-08-05 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62021 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 33247 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33247action=edit test-case to reprroduce Test should be compiled with -O2 -fopenmp -march=core-avx2

[Bug rtl-optimization/61672] Less redundant instructions deleted by pre_delete after r208113.

2014-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Any comments will be appreciated.

[Bug rtl-optimization/61672] Less redundant instructions deleted by pre_delete after r208113.

2014-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672 --- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 33235 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33235action=edit file to reproduce Need to be compiled with -m32 -O3 -Wframe-larger-than=1728 -std=gnu

[Bug rtl-optimization/61672] Less redundant instructions deleted by pre_delete after r208113.

2014-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- Richard, I put into attachment original file. For compiler built 20140208 and 20140730 I've got: grep -c redundant test.cc.179r.pre (20140208) 3825 grep -c redundant test

[Bug rtl-optimization/61672] Less redundant instructions deleted by pre_delete after r208113.

2014-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672 --- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com --- Richard, I put the original file into 61672 attachment and add comments for reproducing. 2014-08-04 15:16 GMT+04:00 rguenth at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org: https

[Bug rtl-optimization/61672] [4.9/4.10 Regression] Less redundant instructions deleted by pre_delete after r208113.

2014-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61672 --- Comment #7 from Yuri Rumyantsev ysrumyan at gmail dot com --- It really fixes the issue. Thanks.

[Bug tree-optimization/62012] New: Loop is not vectorized after function inlining (SCEV)

2014-08-04 Thread ysrumyan at gmail dot com
Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com We noticed that for one important benchmark using '-lto' options leads to performance degradation which is caused by not-vectorizing the hottest loop after function inlining. I

[Bug tree-optimization/62012] Loop is not vectorized after function inlining (SCEV)

2014-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62012 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 33241 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33241action=edit test-case to reproduce Options to compile are: -Ofast -m64 -march=core-avx2 -fopenmp

[Bug tree-optimization/61822] gcc.dg/vect/vect-cond-reduc-1.c FAILs

2014-07-22 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61822 --- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com --- Hi Rainer, Could you try attached patch to check if it helps (test should not be run for sparc). Thanks ahead. Yuri.. 2014-07-16 19:20 GMT+04:00 ro at gcc dot gnu.org gcc

[Bug tree-optimization/61822] gcc.dg/vect/vect-cond-reduc-1.c FAILs

2014-07-17 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61822 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- It looks like /* { dg-require-effective-target vect_condition } */ directive was missed in vect-cond-reduc-1.c test. I will fix it asap.

[Bug tree-optimization/61743] New: Complete unroll is not happened for loops with short upper bound

2014-07-08 Thread ysrumyan at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com We discovered significant performance regression on one important benchmark from eembc2.0 suite after r211625. It turned out that complete unroll

[Bug tree-optimization/61743] Complete unroll is not happened for loops with short upper bound

2014-07-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61743 --- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 33088 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33088action=edit test-case to reproduce Use '-O3 -funroll-loops -Dbtype=[int,e_u8]' to reproduce.

[Bug tree-optimization/61742] [4.10 Regression] wrong code at -O3 on x86_64-linux-gnu

2014-07-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61742 --- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com --- This is duplicate of PR 61576 and it should pass after r212347.

[Bug tree-optimization/61742] [4.10 Regression] wrong code at -O3 on x86_64-linux-gnu

2014-07-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61742 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- Ok. I will add it. 2014-07-08 14:45 GMT+04:00 jakub at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61742 --- Comment #3 from Jakub

[Bug rtl-optimization/61672] New: Less redundant instructions deleted by pre_delete after r208113.

2014-07-02 Thread ysrumyan at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com In real application which is compiled with restrictions on frame size after r208113 number of deleted redundant instruction decreased significantly

[Bug other/61391] [4.10 Regression] ICE in execute_one_pass at -O3 and above

2014-06-26 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61391 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- It turned out that wrong PR number was used in ChangeLog. In fact this bug was fixed: URL: http://gcc.gnu.org/viewcvs?rev=211263root=gccview=rev Log: gcc/ PR tree-optimization

[Bug tree-optimization/61576] [4.10 Regression] wrong code at -O3 on x86_64-linux-gnu

2014-06-23 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61576 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- There is an issue with phi-node and reduction stmt - after r211302 new hammock was inserted between reduction stmt and bb containing phi: bb 6: d.6_12 = d_lsm.14_17 + 1

[Bug tree-optimization/61518] [4.10 Regression] wrong code (by tree vectorizer) at -O3 on x86_64-linux-gnu

2014-06-16 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61518 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan

<    1   2   3   4   >