http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Bug #: 55342
Summary: [LRA,x86] Non-optimal code for simple loop with LRA
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com 2012-11-19
12:06:20 UTC ---
The patching compiler produces better binaries but we still have -6%
performance degradation on corei7. The main cause of it it that LRA compiler
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731
Bug #: 55731
Summary: Issue with complete innermost loop unrolling
(cunrolli)
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com 2012-12-18
14:23:30 UTC ---
Created attachment 28997
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28997
testcase1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com 2012-12-18
14:24:05 UTC ---
Created attachment 28998
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28998
testcase2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com 2012-12-19
09:17:40 UTC ---
(In reply to comment #3)
The reason is that unrolling early can be harmful to for example vectorization
and thus cunrolli restricts itself
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55970
Bug #: 55970
Summary: [x86] Avoid reverse order of function argument
gimplifying
Classification: Unclassified
Product: gcc
Version: unknown
Status:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55970
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-01-14
14:46:48 UTC ---
Created attachment 29162
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29162
testcase
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55970
--- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-01-14
15:15:52 UTC ---
I pointed out that this code is not C standard compliant but it occurred in
customer application that should be ported to x86 platform. This bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787
--- Comment #14 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-01-22
15:32:06 UTC ---
Created attachment 29250
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29250
testcase in F90
Reproducer for IPA_CP
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56129
Bug #: 56129
Summary: Seg fault on 256.bzip2 from spec2000 with -lto and
pre-reload scheduler for x86 Atom
Classification: Unclassified
Product: gcc
Version: 4.8.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151
Bug #: 56151
Summary: Performance degradation after r194054 on x86 Atom.
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-01-30
15:20:07 UTC ---
Created attachment 29306
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29306
testcase to reproduce
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55970
--- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-01
13:34:20 UTC ---
I sent for review a simple fix that can be considered as workaround for this
issue - I simply change macros
#define PUSH_ARGS_REVERSED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175
Bug #: 56175
Summary: Issue with combine phase on x86.
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-01
15:58:49 UTC ---
Created attachment 29330
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29330
testcase
This test must be compiled with the following options
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175
--- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-04
08:55:12 UTC ---
Created attachment 29345
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29345
real test-case
Need to be compiled with proposed options.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56223
Bug #: 56223
Summary: Integer ABS is not recognized for more complicated
pattern
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56223
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-06
11:41:21 UTC ---
Created attachment 29367
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29367
testcase to reproduce
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56200
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC
: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
To reproduce it is sufficient to compile only one source file -
StaggeredLeapfrog2.F which must be preprocessed. In rtl-dump after reload we
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58826
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com ---
In fact LRA is responsible for this failure - there is a bug in constant
regeneration. LRA correctly regenerates all occurrences of virtual register
which is not allocated(i.e
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
After fix for memcpy/memset expansion on x86 target we have met with ICE on the
following simple test:
void
my_memcpy (char *dest, const char *src, int n)
{
__builtin_memcpy (dest, src, n);
}
which
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58384
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
I checked that the fix for PR58831 does not cure the issue, but we cab close
this bug since 253.perlbmk is passed now with pre-reload scheduler.
: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
After patch to improve register preferencing in IRA and to *remove regmove*
pass we noticed performance degradation on several benchmarks
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59036
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 31178
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31178action=edit
test-case to reproduce
test need to be compiled with -m32 option for any x86 targets.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57756
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57756
--- Comment #7 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 31217
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31217action=edit
Additioanl patch for r203634.
See my comments.
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
I attached a simple reproducer to compile with -march=core-avx2 -c -O3
-ffast-math options (-ffast-math is essential) and got the following ICE:
t.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59133
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 31219
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31219action=edit
test-case to reproduce
Need to be compiled with -m32 -march=core-avx2 -O3 -ffast-math
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59133
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com ---
It is worth noting that -m32 option is also essential for reproducing.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57756
--- Comment #9 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Hi Uros,
I decided that the bug owner should fix it and send my patch (or
modified one) for review to GCC community, i.e. I was not planning to
fix it. But if I should do it pls
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58826
--- Comment #6 from Yuri Rumyantsev ysrumyan at gmail dot com ---
I assume that this bug should be closed since it is not reproducible after the
latest LRA fix.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58721
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58721
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
It turned out that proposed fix does not help trunk compilers since now another
huge routine is inlined firstly (read_input) and for perdida we got the
following message
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58721
--- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 31348
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31348action=edit
test-case to reproduce
It need to be compiled with -Ofast -flto options to reproduce.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58721
--- Comment #7 from Yuri Rumyantsev ysrumyan at gmail dot com ---
I saw that on old compiler sources (dated by 20130911) with my patch 'perdida'
was inlined without any additional inline parameters (using -flto) but now it
does not inlined since
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54240
Bug #: 54240
Summary: Routine hoist_adjacent_loads does not work properly
after r189366
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54241
Bug #: 54241
Summary: Routine hoist_adjacent_loads does not work properly
after r189366
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54935
Bug #: 54935
Summary: No way to do if converison
Classification: Unclassified
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54935
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
Version|unknown |4.8.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
Bug #: 54939
Summary: Very poor vectorization of loops with complex
arithmetic
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com 2012-10-16
14:54:50 UTC ---
Created attachment 28455
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28455
test reproducer
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
--- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com 2012-10-16
15:06:19 UTC ---
I looked through the list of all issues related to vectorization but could not
find duplicate.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175
--- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-11
13:42:49 UTC ---
This pattern is already recognized by simplify_bitwise_binary but only for
usual int type, i.e. if we change all short types to the ordinary int
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175
--- Comment #7 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-12
13:05:16 UTC ---
(In reply to comment #6)
(In reply to comment #5)
This pattern is already recognized by simplify_bitwise_binary but only for
usual int type
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175
--- Comment #9 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-12
14:43:53 UTC ---
(In reply to comment #8)
(In reply to comment #7)
(In reply to comment #6)
(In reply to comment #5)
This pattern is already recognized
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175
--- Comment #11 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-14
12:03:37 UTC ---
I did measurements of 3 possible fixes:
1. Comment out 2 patterns related to type sinking.
2. Comment out 1st pattern only.
3. Prohibit type
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56415
Bug #: 56415
Summary: Performance regression after fix for 56273
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56415
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-21
08:59:33 UTC ---
Created attachment 29515
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29515
testcase
Test must be compiled with -O3 -funroll-loops options.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56415
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-21
09:11:21 UTC ---
This bug was introduced by the following fix:
r195940 is guilty:
Author: rguenth
Date: Mon Feb 11 13:33:19 2013
New Revision: 195940
URL
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56483
Bug #: 56483
Summary: LTO issue with expanding GIMPLE_COND
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56483
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-28
14:43:22 UTC ---
Created attachment 29551
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29551
test-case to reproduce
Test must be compiled with -O2 -flto -fno
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56595
Bug #: 56595
Summary: Tree-ssa-pre can create loop carried dependencies
which prevent loop vectorization.
Classification: Unclassified
Product: gcc
Version: 4.8.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56595
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-03-11
13:38:25 UTC ---
Created attachment 29636
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29636
testcase
This test must be compiled with the following options
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56688
Bug #: 56688
Summary: Fortran save statement prevents loop vectorization.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56717
Bug #: 56717
Summary: Enhance Dot-product pattern recognition to avoid mult
widening.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56778
Bug #: 56778
Summary: ICE on several benchmarks after r196775.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56799
Bug #: 56799
Summary: Runfail after r197060+r197082.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56799
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-01
14:51:28 UTC ---
It is sufficient to compile test with '-O2' option.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
Bug #: 56812
Summary: Simple loop is not SLP-vectorized after r196872
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-02
11:22:45 UTC ---
Created attachment 29775
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29775
testcase
Need to compile with -O3 -funroll-loops options.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-02
11:41:23 UTC ---
Sorry, i did a typo in -march option - it must be -march=corei7 -mavx.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-02
13:27:15 UTC ---
Yes, the test-case is correct. If we delete your changes we got thee following
(with -ftree-vectorizer-verbose-3):
t.cc:12: note: vectorizing stmts
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56826
Bug #: 56826
Summary: Run-fail after r197189.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56826
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-03
09:48:59 UTC ---
Created attachment 29789
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29789
test-case to reproduce
To reproduce tha failure the test must
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
--- Comment #12 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-08
14:03:45 UTC ---
Richard,
We found out another issue related to your fix (r196872), namely for the
attached test-case t1.c function vect_gen_niters_for_prolog_loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
--- Comment #13 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-08
14:05:26 UTC ---
Created attachment 29824
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29824
testcase
The following optins were used to compile on x86
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56878
Bug #: 56878
Summary: Issue with candidate choice in
vect_gen_niters_for_prolog_loop.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56878
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-08
15:47:05 UTC ---
Created attachment 29826
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29826
testcase
Need to be compiled on x86 with the following options
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
--- Comment #15 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-08
15:48:45 UTC ---
New bug has been opened for it:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56878
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56885
--- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-09
13:33:53 UTC ---
I did simple investigation and found out that
1. Test is compiled successfully without selective scheduling, i.e. with
'-fschedule-insns' only.
2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56885
--- Comment #6 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-09
14:22:28 UTC ---
Forgot to mention that __builtin_memset and function argument are not
interchangeable since both use the same register di.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56935
Bug #: 56935
Summary: Basic block is not SLP-vectorizeed after r197635.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56935
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-12
14:01:50 UTC ---
Created attachment 29862
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29862
testcase
Need to be compiled with the following options:
-O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56935
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-15
14:54:50 UTC ---
Richard,
both subq's are accessed the same cash line and it means that after 1st store
tthe 2nd load will stall till finish updating data cash
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56935
--- Comment #6 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-04-22
14:46:16 UTC ---
Richard,
Sorry for troubles since we found out the real cause of performance degradation
- code layout was changed after your fix and it caused ~5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57124
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
This is based on performance analysis of eembc2.0 suite on Atom processor in
comparison with clang compiler.
I prepared a simple reproducer
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57430
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 30203
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30203action=edit
testcase
Need to compile with -O2 -m32 options on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57430
--- Comment #2 from Yuri Rumyantsev ysrumyan at gmail dot com ---
I don't believe that this is related to rtl optimizations, but rather to
inlining phase. To prove it I did small changes in t.c for remove.c (it now has
type void):
void remove
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57430
--- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 30213
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30213action=edit
modified testcase
This is modified testcase which does not have a problem
: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
We found significant performance drop after changes in lra phase, which can be
demonstrated on example from
http://gcc.gnu.org/bugzilla
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57468
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 30224
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30224action=edit
test-case to reproduce
It should be compiled on x86 with -O2 -m32 options.
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
For the following simple test-case extracted from 254.gap (spec2000):
typedef unsigned long ul;
void foo (ul* __restrict x, ul* __restrict y, ul n)
{
ul i;
for (i=1; i=n; i++, x++, y
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57430
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
I have not seen any comments about my latest note - have you any ideas about
this issue?
Thanks ahead.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57602
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56129
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796
--- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com ---
(In reply to Jakub Jelinek from comment #3)
By tuning I've meant the vectorizer cost model. If the desirability of
gathers vs. no vectorization at all doesn't depend only
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954
Yuri Rumyantsev ysrumyan at gmail dot com changed:
What|Removed |Added
CC||ysrumyan
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57602
--- Comment #12 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Jan,
I tried to test your fix and got the following error message while
building trunk compiler (with your fix):
../../../../../trunk/libstdc++-v3/src/c++11/fstream-inst.cc:48:1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57602
--- Comment #14 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Hi Jan,
I checked that all benches from spec2000 are run successfully with
-flto options and eembc_2_0 suite was also run sucessfully with lto
(for 32-bit mode).
So go ahead
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
If we consider the following simple test-case
int a[100];
void foo()
{
a[0] = a[1] = a[2] = a[3] = 0;
}
SLP vectorization of basic block takes place:
gcc -S -O3 -m32 t.c -ftree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #8 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 30751
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30751action=edit
modified test-case
Modified test-case to reproduce sub-optimal register allocation.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #9 from Yuri Rumyantsev ysrumyan at gmail dot com ---
The issue still exists in 4.9 compiler but we got another 30% degradation after
r202165 fix. It can be reproduced with modified test-case which as attached
with any 4.9 compiler
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
We also assume that arm can have the same problem at given benchmark if -flto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58384
--- Comment #1 from Yuri Rumyantsev ysrumyan at gmail dot com ---
Created attachment 30791
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30791action=edit
test-case to reproduce
This is compile only test which must be compiled with pre-reload
1 - 100 of 309 matches
Mail list logo