[Bug tree-optimization/78899] New: [7 Regression] Vestorized loop with optmized mask stores motion is completely deleted after r242520.

2016-12-22 Thread ysrumyan at gmail dot com
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- Created attachment 40395 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40395=e

[Bug target/78794] [7 Regression] We noticed ~9% regression in 32-bit mode for 462.libquntum on Avoton after r243202

2016-12-13 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78794 --- Comment #9 from Yuri Rumyantsev --- Hi Uros, I checked thta with your patch performance is recovered on Avoton machine: before after 462.libquantum18.400020.9000 +13.58% Best regards. Yuri.

[Bug target/78794] [7 Regression] We noticed ~9% regression in 32-bit mode for 462.libquntum on Avoton after r243202

2016-12-13 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78794 --- Comment #1 from Yuri Rumyantsev --- Created attachment 40322 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40322=edit test-case to reproduce Compile with -O2 -march=slm -m32 options to reproduce.

[Bug target/78794] New: [7 Regression] We noticed ~9% regression in 32-bit mode for 462.libquntum on Avoton after r243202

2016-12-13 Thread ysrumyan at gmail dot com
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- STV phase does not recognize some patterns after this revision, regression can be reproduced

[Bug rtl-optimization/78634] New: [7 Regression] 30% performance drop after r242832.

2016-12-01 Thread ysrumyan at gmail dot com
: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- Created attachment 40215 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40215=edit test-case to reproduce We noticed a huge performance regression on

[Bug tree-optimization/78496] New: Missed opportunities for jump threading

2016-11-23 Thread ysrumyan at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- Created attachment 40131 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40131=edit test-case to reproduce, compile with -O3 option. We noticed a huge performa

[Bug tree-optimization/78348] [7 REGRESSION] 15% performance drop for coremark-pro/nnet-test after r242038

2016-11-15 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78348 --- Comment #5 from Yuri Rumyantsev --- Yes, I think so. 2016-11-15 14:49 GMT+03:00 rguenth at gcc dot gnu.org : > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78348 > > Richard Biener changed: > >What

[Bug tree-optimization/77445] [7 Regression] Performance drop after r239219 on coremark test

2016-11-14 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77445 --- Comment #4 from Yuri Rumyantsev --- Ping. Do you have any progress on this? Thanks.

[Bug tree-optimization/78348] [7 REGRESSION] 15% performance drop for coremark-pro/nnet-test after r242038

2016-11-14 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78348 --- Comment #1 from Yuri Rumyantsev --- Created attachment 40036 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40036=edit test-case to reproduce Must be compiled with -O3 option to reproduce.

[Bug tree-optimization/78348] New: [7 REGRESSION] 15% performance drop for coremark-pro/nnet-test after r242038

2016-11-14 Thread ysrumyan at gmail dot com
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- We noticed huge (>15%) performance drop after fix in loop distribution phase. Before fix fix distribut

[Bug ipa/78268] [7 Regression] internal compiler error: Segmentation fault

2016-11-09 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78268 Yuri Rumyantsev changed: What|Removed |Added CC||ysrumyan at gmail dot com --- Comment

[Bug rtl-optimization/78116] [7 regression] Performance drop after r241173 on avx512 target

2016-10-31 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78116 --- Comment #7 from Yuri Rumyantsev --- Compiler was configures with: Configured with: /configure --enable-languages=c,c++ --enable-clocale=gnu --enable-cloog-backend=isl --enable-shared --disable-libsanitizer --disable-bootstrap --disable-nls

[Bug rtl-optimization/78116] [7 regression] Performance drop after r241173 on avx512 target

2016-10-27 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78116 --- Comment #5 from Yuri Rumyantsev --- Yes, some virtual register are allocated on stack and we got more loads from stack to get their values.

[Bug rtl-optimization/78116] [7 regression] Performance drop after r241173 on avx512 target

2016-10-27 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78116 --- Comment #3 from Yuri Rumyantsev --- Created attachment 39910 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39910=edit another test-case Must be compiled with "-Ofast -fopenmp -funroll-loops -march=knl"

[Bug rtl-optimization/78116] [7 regression] Performance drop after r241173 on avx512 target

2016-10-27 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78116 --- Comment #2 from Yuri Rumyantsev --- WE also found out performance drop on another important benchmark with the same symptoms after r241170, namely loop marked with .L18 has +12 more fills from stack. The test-case will be attached.

[Bug rtl-optimization/78116] [7 regression] Performance drop after r241173 on avx512 target

2016-10-26 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78116 --- Comment #1 from Yuri Rumyantsev --- Created attachment 39892 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39892=edit test-case to reproduce Must be compiled with "-Ofast -funroll-loops -march=knl" options.

[Bug rtl-optimization/78116] New: [7 regression] Performance drop after r241173 on avx512 target

2016-10-26 Thread ysrumyan at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- I attached the simple test-case to reproduce issue. Before this revision loop marked with label .L27 has 25 instructions

[Bug target/78007] Important loop from 482.sphinx3 is not vectorized

2016-10-17 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78007 --- Comment #1 from Yuri Rumyantsev --- Created attachment 39821 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39821=edit test-case to reproduce It is sufficient to compiler it with -Ofast option on x86 platform.

[Bug target/78007] New: Important loop from 482.sphinx3 is not vectorized

2016-10-17 Thread ysrumyan at gmail dot com
: target Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- The issue is related to missing support for __builtin_bswap32: t1.c:9:3: note: function is not vectorizable. t1.c:9:3: note: not vectorized: relevant stmt not supported

[Bug tree-optimization/77498] [7 regression] Performance drop after r239414 on spec2000/172mgrid

2016-09-06 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77498 --- Comment #1 from Yuri Rumyantsev --- Created attachment 39574 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39574=edit test-case to reproduce Need to compile with -O2 -ffast-math to reproduce.

[Bug tree-optimization/77498] New: [7 regression] Performance drop after r239414 on spec2000/172mgrid

2016-09-06 Thread ysrumyan at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- We noticed significant regression after https://gcc.gnu.org/viewcvs/gcc?view=revision=239414 I attached simple routine

[Bug rtl-optimization/71956] [7 Regression] 176.gcc fails on 32 bits when compiled with -march=core-avx2

2016-09-02 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71956 --- Comment #5 from Yuri Rumyantsev --- This bug is fixed by Author: ppalka Date: Sat Aug 27 22:00:17 2016 New Revision: 239798 URL: https://gcc.gnu.org/viewcvs?rev=239798=gcc=rev Log: Fix folding of VECTOR_CST comparisons gcc/ChangeLog:

[Bug tree-optimization/77445] [7 Regression] Performance drop after r239219 on coremark test

2016-09-01 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77445 --- Comment #1 from Yuri Rumyantsev --- Created attachment 39535 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39535=edit test-case to reproduce It is sufficient to compile it with -Ofast option.

[Bug tree-optimization/77445] New: [7 Regression] Performance drop after r239219 on coremark test

2016-09-01 Thread ysrumyan at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- We noticed huge (32%) performance drop on coremark-pro/core (former coremark benchmark) after http://gcc.gnu.org/viewcvs/gcc

[Bug target/77344] Internal Compiler Error with arch knl

2016-08-26 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77344 Yuri Rumyantsev changed: What|Removed |Added CC||ysrumyan at gmail dot com --- Comment

[Bug tree-optimization/71077] [7 Regression] gcc -lto raises ICE

2016-08-18 Thread ysrumyan at gmail dot com
org>: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71077 > > --- Comment #6 from patrick at parcs dot ath.cx --- > On Fri, 12 Aug 2016, ysrumyan at gmail dot com wrote: > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71077 >> >> Yuri Rumyantsev cha

[Bug tree-optimization/71077] [7 Regression] gcc -lto raises ICE

2016-08-12 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71077 Yuri Rumyantsev changed: What|Removed |Added CC||ysrumyan at gmail dot com --- Comment

[Bug rtl-optimization/71956] [7 Regression] 176.gcc fails on 32 bits when compiled with -march=core-avx2

2016-08-12 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71956 --- Comment #4 from Yuri Rumyantsev --- Need to read "problem file is 176.gcc/src/sched.c, problem function sched_analyze_insn.

[Bug rtl-optimization/71956] [7 Regression] 176.gcc fails on 32 bits when compiled with -march=core-avx2

2016-08-12 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71956 --- Comment #3 from Yuri Rumyantsev --- It turned out that after r235653 (with minor int->bool type change) 176.gcc started RF. If we turn off vrp phase benchmark passes. The problem fail is sched.c. Note that avx2 is essential for reproducing.

[Bug rtl-optimization/70467] Useless "and [esp],-1" emitted on AND with uint64_t variable

2016-08-11 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70467 Yuri Rumyantsev changed: What|Removed |Added CC||ysrumyan at gmail dot com --- Comment

[Bug rtl-optimization/71956] [7 Regression] 176.gcc fails on 32 bits when compiled with -march=core-avx2

2016-08-10 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71956 Yuri Rumyantsev changed: What|Removed |Added CC||ysrumyan at gmail dot com --- Comment

[Bug testsuite/72850] [7 Regression] FAIL: gcc.dg/tree-ssa/pr69270-3.c scan-tree-dump-times uncprop1 ", 1" 4

2016-08-10 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72850 --- Comment #4 from Yuri Rumyantsev --- Created attachment 39093 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39093=edit test-case to reproduce It is safficient use -Ofast option to compile on x86 machine.

[Bug testsuite/72850] [7 Regression] FAIL: gcc.dg/tree-ssa/pr69270-3.c scan-tree-dump-times uncprop1 ", 1" 4

2016-08-10 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72850 Yuri Rumyantsev changed: What|Removed |Added CC||ysrumyan at gmail dot com --- Comment

[Bug c/72794] [7 regression] CF on spec2000/176.gcc after r238862.

2016-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72794 --- Comment #6 from Yuri Rumyantsev --- Thanks for clarification. This bug can be closed as user misunderstanding. 2016-08-04 14:08 GMT+03:00 rguenth at gcc dot gnu.org : >

[Bug c/72794] [7 regression] CF on spec2000/176.gcc after r238862.

2016-08-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72794 --- Comment #4 from Yuri Rumyantsev --- I assume that there is still issue in lto part of compiler - even if we ignore "inline" attribute we (lto) must not delete such functions from binaries. So this bug must be forwarded to lto phase.

[Bug c/72794] [7 regression'] CF on spec2000/176.gcc after r238862.

2016-08-03 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72794 --- Comment #2 from Yuri Rumyantsev --- Yes, this option cures CF. Does it mean that we must compile spec2000 with this flag? 2016-08-03 19:08 GMT+03:00 pinskia at gcc dot gnu.org : >

[Bug c/72794] New: [7 regression'] CF on spec2000/176.gcc after r238862.

2016-08-03 Thread ysrumyan at gmail dot com
Component: c Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- We noticed that after this commit benchmark is failed with message: /tmp/cchqWD0Q.ltrans0.ltrans.o: In function `yylex': :(.text+0x566e): undefined reference

[Bug tree-optimization/72739] New: [7 Regression] FAIL: gcc.dg/vect/vect-mask-store-move-1.c after r238301

2016-07-28 Thread ysrumyan at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- We noticed that after this revision test is failed: FAIL: gcc.dg/vect/vect-mask-store-move-1.c scan-tree-dump-times

[Bug tree-optimization/56688] static/saved variables prevent loop vectorization.

2016-07-22 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56688 --- Comment #8 from Yuri Rumyantsev --- I checked that if we comment down 'save' stmt in thin6d.f all loops will be vectorized: grep -c 'LOOP VECTORIZED' thin6d.f.149t.vect 32

[Bug tree-optimization/56688] Fortran save statement prevents loop vectorization.

2016-07-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56688 --- Comment #7 from Yuri Rumyantsev --- I checked that GCC 7 compiler still does not vectorize loops in thin6d function which is the only hottest function in 200.sixtrack benchmark.

[Bug rtl-optimization/65698] Non-optimal code for simple compare function for x86 32-bit target

2016-07-20 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65698 --- Comment #3 from Yuri Rumyantsev --- I see that this bug was no considered for a while. Here is my additional comment. First of all, this test was extracted from bzip2 benchmark, mainGTU function. The problem is that (1) tree optimizer

[Bug middle-end/71734] [7 Regression] FAIL: libgomp.fortran/simd4.f90 -O3 -g execution test

2016-07-19 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71734 --- Comment #7 from Yuri Rumyantsev --- H.J. I've just checked this test with my local fixed compiler and got: Running /users/ysrumyan/workspaces/71261/gcc/testsuite/g++.dg/vect/vect.exp ... PASS: g++.dg/vect/pr70729.cc -std=c++11

[Bug tree-optimization/70729] Loop marked with omp simd pragma is not vectorized

2016-07-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70729 --- Comment #37 from Yuri Rumyantsev --- Jakub, I assume that yoour #C33 test-case is not correct, i.e. it can not be marked with pragma omp simd. For example, even if we turn off lim phase it will be aborted: my_g++ -O3 -m64 t33.cpp -o

[Bug tree-optimization/70729] Loop marked with omp simd pragma is not vectorized

2016-07-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70729 --- Comment #36 from Yuri Rumyantsev --- #c33 testcase was not tested since I have some doubts about it. Note that original problem was #pragma omp simd for (int i=0; i

[Bug tree-optimization/70729] Loop marked with omp simd pragma is not vectorized

2016-07-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70729 --- Comment #34 from Yuri Rumyantsev --- Thanks a lot Jakub for your detail comments. I have simple fix which cures failures from 71734. The fix is simple enough and simply check that the ref in problem belongs to simd loop: diff --git

[Bug tree-optimization/70729] Loop marked with omp simd pragma is not vectorized

2016-06-10 Thread ysrumyan at gmail dot com
gcc-bugzi...@gcc.gnu.org>: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70729 > > --- Comment #24 from rguenther at suse dot de --- > On Wed, 8 Jun 2016, ysrumyan at gmail dot com wrote: > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70729 >> >> --- Comment #23 from Yuri

[Bug tree-optimization/70729] Loop marked with omp simd pragma is not vectorized

2016-06-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70729 --- Comment #23 from Yuri Rumyantsev --- OK. I will try to prepare the second part of patch. Few comments about vect-simd-clone-5.c test failure. 1. This loop is marked with safelen=MAX_INT. 2. It contains the following stmt's: D.3301 =

[Bug rtl-optimization/71453] Spills to vector registers are sub-optimal.

2016-06-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71453 --- Comment #2 from Yuri Rumyantsev --- Forgot to mention that number of instructions is on 10% more 632 vs 702 for spills into vector registers.

[Bug rtl-optimization/71453] Spills to vector registers are sub-optimal.

2016-06-08 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71453 --- Comment #1 from Yuri Rumyantsev --- Created attachment 38659 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38659=edit test-case to reproduce Must be compiled with -O2 -march=core-avx2 -m32 options.

[Bug rtl-optimization/71453] New: Spills to vector registers are sub-optimal.

2016-06-08 Thread ysrumyan at gmail dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- We notice significant performance regression on one important benchmark after r235523. Note that fix is not responsible for it. A problem is related to spill/fill

[Bug tree-optimization/70729] Loop marked with omp simd pragma is not vectorized

2016-06-07 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70729 --- Comment #21 from Yuri Rumyantsev --- Richard! Are you planning to prepare the second part of the patch (zeroing safelen and testing it in loop invariant motion phase as you proposed)? Thanks.

[Bug tree-optimization/71437] [7 regression' Performance regression after r235817

2016-06-06 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71437 --- Comment #1 from Yuri Rumyantsev --- Created attachment 38652 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38652=edit test-case to reproduce Need to be compiled with -O3 -m32 -ffast-math on x86-64.

[Bug tree-optimization/71437] New: [7 regression' Performance regression after r235817

2016-06-06 Thread ysrumyan at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- We noticed ~10% slowdown on one important benchmark used for Silvermont testing. I can reproduced this performance gap using attached test-case on SandyBridge

[Bug tree-optimization/71347] [7 regression] Performance drop after r235513 on x86-64 in 32-bit mode.

2016-05-30 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71347 --- Comment #1 from Yuri Rumyantsev --- Created attachment 38600 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38600=edit test-case to reproduce Need to be compiled with -O2 -m32 -march=slm -ffast-math options on x64-64.

[Bug tree-optimization/71347] New: [7 regression] Performance drop after r235513 on x86-64 in 32-bit mode.

2016-05-30 Thread ysrumyan at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- We noticed significant regression (more then 10%) after this revision whicn can be illustrated on the following simple

[Bug rtl-optimization/71275] [7 regression] Performance drop after r235660 on x86-64 in 32-bit mode.

2016-05-25 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71275 --- Comment #1 from Yuri Rumyantsev --- Created attachment 38564 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38564=edit test-case to reproduce Must be compiled with -O2 -m32 -march=slm options.

[Bug rtl-optimization/71275] New: [7 regression] Performance drop after r235660 on x86-64 in 32-bit mode.

2016-05-25 Thread ysrumyan at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- Regression can be seen at attached test-case. In the tail block of innermost loop redundant fill was added: before

[Bug debug/70935] [6/7 Regression] ICE: verify_ssa failed (error: definition in block 9 does not dominate use in block 12) w/ -O3 -g

2016-05-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70935 --- Comment #3 from Yuri Rumyantsev --- Jacub, Here is a simple fix - do not take into consideration edges destination of which is loop latch block, i.e. loop is endless: diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c

[Bug rtl-optimization/70873] [GCC7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.

2016-04-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873 --- Comment #1 from Yuri Rumyantsev --- Created attachment 38375 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38375=edit test-case to reproduce Must be compiled with -O2 -mavx2 -m32 options.

[Bug rtl-optimization/70873] New: [GCC7 Regressio] 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.

2016-04-29 Thread ysrumyan at gmail dot com
Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- This degradation is caused by known issue with partial register dependency: https

[Bug tree-optimization/70729] Loop marked with omp simd pragma is not vectorized

2016-04-28 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70729 --- Comment #12 from Yuri Rumyantsev --- Created attachment 38367 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38367=edit modified patch

[Bug tree-optimization/70729] Loop marked with omp simd pragma is not vectorized

2016-04-28 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70729 --- Comment #11 from Yuri Rumyantsev --- Richard, I slightly modify the patch proposed by you: 1. Apply loop->safelen check only if lim is invoked before loop vectorization since its value could be non-correct (I simply add bool param to it).

[Bug tree-optimization/70849] Loop can be vectorized through gathers on AVX2 platforms.

2016-04-28 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70849 --- Comment #1 from Yuri Rumyantsev --- Created attachment 38365 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38365=edit test-case to reproduce Must be compiled with -O3 -mavx2 options

[Bug tree-optimization/70849] New: Loop can be vectorized through gathers on AVX2 platforms.

2016-04-28 Thread ysrumyan at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- Simple test which will be attached is not vectorized as not profitable: test.c:11:5: note: cost model: the vector iteration cost

[Bug tree-optimization/70729] Loop marked with omp simd pragma is not vectorized

2016-04-19 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70729 --- Comment #6 from Yuri Rumyantsev --- Richard, I did change proposed by you but it still does not help since we have loop-carried dependency through this_4(D)->S_n: : _5 = this_4(D)->S_n; ... : pretmp_54 = this_4(D)->C2; pretmp_57

[Bug tree-optimization/70729] Loop marked with omp simd pragma is not vectorized

2016-04-19 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70729 --- Comment #1 from Yuri Rumyantsev --- Created attachment 38309 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38309=edit test-case to reproduce Must be compiled with -Ofast -mavx2 -fopenmp options on x86 machine.

[Bug tree-optimization/70729] New: Loop marked with omp simd pragma is not vectorized

2016-04-19 Thread ysrumyan at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- Analyzing performance of one important benchmark we found out that one of the hot loop is no vectorized since loop-invariant load of the class member has

[Bug target/70482] Opimization opportunity to vectorize basic block for -mavx target.

2016-04-01 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70482 --- Comment #2 from Yuri Rumyantsev --- Richard, The problem is in pattern matching: /* Pattern detected. */ if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "vect_recog_widen_mult_pattern:

[Bug tree-optimization/70482] New: Opimization opportunity to vectorize basic block for -mavx target.

2016-03-31 Thread ysrumyan at gmail dot com
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- If we compile bb-slp-pattern-1.c from gcc.dg/vect suite with -mavx pattern vectorization won't happen since AVX has very

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2016-03-11 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #27 from Yuri Rumyantsev --- Created attachment 37940 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37940=edit test-case to reproduce Need to be compiled with -Ofast -mavx2 -fopenmp options.

[Bug tree-optimization/66142] Loop is not vectorized because not sufficient support for GOMP_SIMD_LANE

2016-03-11 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66142 --- Comment #26 from Yuri Rumyantsev --- If we convert copy structures to copy structure fields test will be vectorized and all mentions of GOMP_SIMD_LANE will be deleted. But if we slightly modify test by introducing new function vdot and

[Bug rtl-optimization/69633] [6 Regression] Redundant move is generated after r228097

2016-03-09 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69633 --- Comment #3 from Yuri Rumyantsev --- Sorry for a confusion. The bug must be closed as user mistake. 2016-03-07 19:18 GMT+03:00 bernds at gcc dot gnu.org : > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69633 > >

[Bug rtl-optimization/69942] gcc.dg/ifcvt-5.c FAILs

2016-02-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69942 --- Comment #3 from Yuri Rumyantsev --- Created attachment 37822 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37822=edit proposed patch Patch to resolve ifcvt5.c failure.

[Bug rtl-optimization/69942] gcc.dg/ifcvt-5.c FAILs

2016-02-29 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69942 --- Comment #2 from Yuri Rumyantsev --- I attached patch which resolves failure.

[Bug rtl-optimization/69942] gcc.dg/ifcvt-5.c FAILs

2016-02-26 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69942 --- Comment #1 from Yuri Rumyantsev --- The cause of issue is that before ce1 phase pde (or pre) transformation has been done to remove partial redundant moves to variable i and j, i.e. code int i = x; int j = y; if (x > y) { i =

[Bug tree-optimization/69783] New: [6 Regression] Loop is not vectorized after r233212

2016-02-12 Thread ysrumyan at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- After changes in vect_prune_runtime_alias_test_list() a number of merging ranges was significantly decreased: Before fix improved number of alias checks

[Bug tree-optimization/69783] [6 Regression] Loop is not vectorized after r233212

2016-02-12 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69783 --- Comment #1 from Yuri Rumyantsev --- Created attachment 37671 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37671=edit test-case to reproduce It needs to be compiled with -Ofast -funroll-loops on x86-64

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-02-09 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052 --- Comment #13 from Yuri Rumyantsev --- I checked that performance is back for the whole benchmark. Thanks a lot. Yuri. 2016-02-09 14:17 GMT+03:00 amker at gcc dot gnu.org : >

[Bug tree-optimization/69652] [6 Regression] [ICE] verify_ssa fail w/ -O2 -ffast-math -ftree-vectorize

2016-02-05 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69652 --- Comment #5 from Yuri Rumyantsev --- Jacub, I'd like to clarify one your remark: 5) IMHO you should give up also for !is_gimple_assign, say trying to move an elemental function call into the conditional is just wrong What's wrong in call

[Bug tree-optimization/69652] [6 Regression] [ICE] verify_ssa fail w/ -O2 -ffast-math -ftree-vectorize

2016-02-04 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69652 --- Comment #4 from Yuri Rumyantsev --- Jacub, Thanks a lot for your detail comments! I've just sent a patch for review to gcc-patches. Could you please take a look on it? Best regards. Yuri. 2016-02-03 20:22 GMT+03:00 jakub at gcc dot

[Bug tree-optimization/69652] [6 Regression] [ICE] verify_ssa fail w/ -O2 -ffast-math -ftree-vectorize

2016-02-03 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69652 --- Comment #2 from Yuri Rumyantsev --- This is my fault - forgot to fix vuse for scalar statements which are crossed by masked stores during code motion. Fix is testing and will be sent for review tomorrow.

[Bug rtl-optimization/69633] [6 Regression] Redundant move is generated after r228097

2016-02-02 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69633 --- Comment #1 from Yuri Rumyantsev --- Created attachment 37559 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37559=edit test-case to reproduce Need to be compiled with -O2 -m32 -pie -fPIE. Assume that -march=slm is not needed.

[Bug rtl-optimization/69633] New: [6 Regression] Redundant move is generated after r228097

2016-02-02 Thread ysrumyan at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- Sorry, that we noticed this regression just now but not in September. After Makarov's fix for 61578 ( and s390 regression) we noticed

[Bug tree-optimization/69467] New: [6 Regression] Pattern X * C1 CMP 0 to X CMP 0 causes performance drop on 32-bit x86.

2016-01-25 Thread ysrumyan at gmail dot com
Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- This is caused by the same revision as 67438 http://gcc.gnu.org/viewcvs/gcc?view=revision=225248

[Bug tree-optimization/69467] [6 Regression] Pattern X * C1 CMP 0 to X CMP 0 causes performance drop on 32-bit x86.

2016-01-25 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69467 --- Comment #1 from Yuri Rumyantsev --- Created attachment 37462 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37462=edit test-case to reproduce Need to compile with -m32 at -O2 or -O3 -funroll-loops options. In description the assembly

[Bug tree-optimization/69467] [6 Regression] Pattern X * C1 CMP 0 to X CMP 0 causes performance drop on 32-bit x86.

2016-01-25 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69467 --- Comment #3 from Yuri Rumyantsev --- Richard, I checked that performance is back with your patch. Thanks. 2016-01-25 17:50 GMT+03:00 rguenth at gcc dot gnu.org : >

[Bug tree-optimization/69297] [6 Regression] Performance regression after r230020

2016-01-18 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69297 --- Comment #4 from Yuri Rumyantsev --- Yes, this loop was added for avoiding dce phase. Thanks. Yuri. 2016-01-18 13:33 GMT+03:00 rguenth at gcc dot gnu.org : > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69297 > >

[Bug tree-optimization/69297] [6 Regression] Performance regression after r230020

2016-01-15 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69297 --- Comment #1 from Yuri Rumyantsev --- Created attachment 37356 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37356=edit test-case to reproduce TO reproduce compile with -Ofast -march=core-avx2 options.

[Bug tree-optimization/69297] New: [6 Regression] Performance regression after r230020

2016-01-15 Thread ysrumyan at gmail dot com
: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- This regression was found on spec2006/464.h264ref. The problem is related to SLP vectorization of BB's and caused by the wrong calculation of scalar cost, e.g

[Bug rtl-optimization/69274] New: [6 Regression] Performance regression after r231814 on x86 Haswell.

2016-01-14 Thread ysrumyan at gmail dot com
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- After this simple fix we got huge regression ( > 16%) for spec2006/435.gromacs on Haswell with "-O2 -ffast-math&

[Bug rtl-optimization/67145] [6 Regression] associativity from pseudo-reg ordering

2016-01-13 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67145 --- Comment #6 from Yuri Rumyantsev --- We checked that proposed patch does not introduce new performance regression and I will prepare it for review after bootstrapping and regression testing completion, likely tomorrow.

[Bug tree-optimization/68522] [6 Regression] SPEC CPU2006 435.gromacs miscomparison

2015-12-31 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68522 Yuri Rumyantsev changed: What|Removed |Added CC||ysrumyan at gmail dot com --- Comment

[Bug rtl-optimization/69052] New: [6 Regression] Performance regression after r229402.

2015-12-25 Thread ysrumyan at gmail dot com
Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- In loop_invariant phase additional function inv_can_prop_to_addr_use which tried to determine if forward propagation for cheap address is possible through call

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2015-12-25 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052 --- Comment #1 from Yuri Rumyantsev --- Created attachment 37133 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37133=edit test-case to reproduce It should be compile with -O2 -m32 options to reproduce.

[Bug rtl-optimization/67145] [6 Regression] associativity from pseudo-reg ordering

2015-12-23 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67145 --- Comment #3 from Yuri Rumyantsev --- Created attachment 37120 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37120=edit non-tested patch

[Bug rtl-optimization/67145] [6 Regression] associativity from pseudo-reg ordering

2015-12-23 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67145 --- Comment #4 from Yuri Rumyantsev --- I attached simple non-tested patch which restores performance on x86. This change is no perfect but using it I noticed 2%-6% speed-up on 32-bit x86 platform. The idea of patch is very simple - we do not

[Bug rtl-optimization/68920] [6 Regression] Undesirable if-conversion for a rarely taken branch

2015-12-17 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68920 --- Comment #4 from Yuri Rumyantsev --- You are quite right - the cost model is very poor. We did simple experiment and set up the branch cost to 1 but noticed performance regressions on other benchmarks. when we set it to 2 we did not see any

[Bug tree-optimization/68906] [6 Regression] ICE at -O3 on x86_64-linux-gnu: verify_ssa failed

2015-12-15 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68906 --- Comment #3 from Yuri Rumyantsev --- I've prepared simple fix which cures ICE. I will send it for review tomorrow. 2015-12-15 12:50 GMT+03:00 jakub at gcc dot gnu.org : >

[Bug tree-optimization/68894] New: Recognition min/max pattern with multiple arguments.

2015-12-14 Thread ysrumyan at gmail dot com
Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- Analyzing one important benchmark (rgb to cmyk conversion) we found out that MIN pattern is not recognized for more than 2 arguments. I attached simple

[Bug tree-optimization/68894] Recognition min/max pattern with multiple arguments.

2015-12-14 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68894 --- Comment #1 from Yuri Rumyantsev --- Created attachment 37026 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37026=edit test-case to reproduce It is sufficient to compile it with -O3 option to see the difference in produced assembly.

  1   2   3   4   >