http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56223
Bug #: 56223
Summary: Integer ABS is not recognized for more complicated
pattern
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRME
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56223
--- Comment #1 from Yuri Rumyantsev 2013-02-06
11:41:21 UTC ---
Created attachment 29367
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29367
testcase to reproduce
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56200
Yuri Rumyantsev changed:
What|Removed |Added
CC||ysrumyan at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175
--- Comment #5 from Yuri Rumyantsev 2013-02-11
13:42:49 UTC ---
This pattern is already recognized by simplify_bitwise_binary but only for
usual int type, i.e. if we change all short types to the ordinary int (or
unsigned) this simplificat
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175
--- Comment #7 from Yuri Rumyantsev 2013-02-12
13:05:16 UTC ---
(In reply to comment #6)
> (In reply to comment #5)
> > This pattern is already recognized by simplify_bitwise_binary but only for
> > usual int type, i.e. if we change all s
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175
--- Comment #9 from Yuri Rumyantsev 2013-02-12
14:43:53 UTC ---
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > (In reply to comment #5)
> > > > This pattern is already recognized by simplify_bitwis
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175
--- Comment #11 from Yuri Rumyantsev 2013-02-14
12:03:37 UTC ---
I did measurements of 3 possible fixes:
1. Comment out 2 patterns related to type sinking.
2. Comment out 1st pattern only.
3. Prohibit type sinking if source type (of def
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
Yuri Rumyantsev changed:
What|Removed |Added
CC||ysrumyan at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56415
Bug #: 56415
Summary: Performance regression after fix for 56273
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56415
--- Comment #1 from Yuri Rumyantsev 2013-02-21
08:59:33 UTC ---
Created attachment 29515
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29515
testcase
Test must be compiled with "-O3 -funroll-loops" options.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56415
--- Comment #2 from Yuri Rumyantsev 2013-02-21
09:11:21 UTC ---
This bug was introduced by the following fix:
r195940 is guilty:
Author: rguenth
Date: Mon Feb 11 13:33:19 2013
New Revision: 195940
URL: http://gcc.gnu.org/viewcvs?r
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56483
Bug #: 56483
Summary: LTO issue with expanding GIMPLE_COND
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Pr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56483
--- Comment #1 from Yuri Rumyantsev 2013-02-28
14:43:22 UTC ---
Created attachment 29551
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29551
test-case to reproduce
Test must be compiled with -O2 -flto -fno-inline options
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56595
Bug #: 56595
Summary: Tree-ssa-pre can create loop carried dependencies
which prevent loop vectorization.
Classification: Unclassified
Product: gcc
Version: 4.8.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56595
--- Comment #1 from Yuri Rumyantsev 2013-03-11
13:38:25 UTC ---
Created attachment 29636
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29636
testcase
This test must be compiled with the following options for x86:
-ffree-line-le
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56688
Bug #: 56688
Summary: Fortran save statement prevents loop vectorization.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: norma
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56717
Bug #: 56717
Summary: Enhance Dot-product pattern recognition to avoid mult
widening.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCON
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56778
Bug #: 56778
Summary: ICE on several benchmarks after r196775.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56799
Bug #: 56799
Summary: Runfail after r197060+r197082.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56799
--- Comment #1 from Yuri Rumyantsev 2013-04-01
14:51:28 UTC ---
It is sufficient to compile test with '-O2' option.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
Bug #: 56812
Summary: Simple loop is not SLP-vectorized after r196872
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
--- Comment #1 from Yuri Rumyantsev 2013-04-02
11:22:45 UTC ---
Created attachment 29775
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29775
testcase
Need to compile with -O3 -funroll-loops options.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
--- Comment #2 from Yuri Rumyantsev 2013-04-02
11:41:23 UTC ---
Sorry, i did a typo in -march option - it must be -march=corei7 -mavx.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
--- Comment #4 from Yuri Rumyantsev 2013-04-02
13:27:15 UTC ---
Yes, the test-case is correct. If we delete your changes we got thee following
(with -ftree-vectorizer-verbose-3):
t.cc:12: note: vectorizing stmts using SLP.BASIC BLOCK VEC
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56826
Bug #: 56826
Summary: Run-fail after r197189.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56826
--- Comment #1 from Yuri Rumyantsev 2013-04-03
09:48:59 UTC ---
Created attachment 29789
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29789
test-case to reproduce
To reproduce tha failure the test must be compiled on x86 corei7 w
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
--- Comment #12 from Yuri Rumyantsev 2013-04-08
14:03:45 UTC ---
Richard,
We found out another issue related to your fix (r196872), namely for the
attached test-case t1.c function vect_gen_niters_for_prolog_loop() uses
non-invariant poi
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
--- Comment #13 from Yuri Rumyantsev 2013-04-08
14:05:26 UTC ---
Created attachment 29824
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29824
testcase
The following optins were used to compile on x86:
-O3 -funroll-loops -ffast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56878
Bug #: 56878
Summary: Issue with candidate choice in
vect_gen_niters_for_prolog_loop.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCON
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56878
--- Comment #1 from Yuri Rumyantsev 2013-04-08
15:47:05 UTC ---
Created attachment 29826
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29826
testcase
Need to be compiled on x86 with the following options:
-O3 -funroll-loops -ffas
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
--- Comment #15 from Yuri Rumyantsev 2013-04-08
15:48:45 UTC ---
New bug has been opened for it:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56878
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56885
--- Comment #5 from Yuri Rumyantsev 2013-04-09
13:33:53 UTC ---
I did simple investigation and found out that
1. Test is compiled successfully without selective scheduling, i.e. with
'-fschedule-insns' only.
2. The problem is that selecti
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56885
--- Comment #6 from Yuri Rumyantsev 2013-04-09
14:22:28 UTC ---
Forgot to mention that __builtin_memset and function argument are not
interchangeable since both use the same register di.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56935
Bug #: 56935
Summary: Basic block is not SLP-vectorizeed after r197635.
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56935
--- Comment #1 from Yuri Rumyantsev 2013-04-12
14:01:50 UTC ---
Created attachment 29862
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29862
testcase
Need to be compiled with the following options:
-O3 -mavx -march=corei7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56935
--- Comment #4 from Yuri Rumyantsev 2013-04-15
14:54:50 UTC ---
Richard,
both subq's are accessed the same cash line and it means that after 1st store
tthe 2nd load will stall till finish updating data cash (this is not exact
explanatio
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56935
--- Comment #6 from Yuri Rumyantsev 2013-04-22
14:46:16 UTC ---
Richard,
Sorry for troubles since we found out the real cause of performance degradation
- code layout was changed after your fix and it caused ~5% slowdown on
253.perlbmk
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54240
Bug #: 54240
Summary: Routine hoist_adjacent_loads does not work properly
after r189366
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54241
Bug #: 54241
Summary: Routine hoist_adjacent_loads does not work properly
after r189366
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54935
Bug #: 54935
Summary: No way to do if converison
Classification: Unclassified
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54935
Yuri Rumyantsev changed:
What|Removed |Added
Version|unknown |4.8.0
--- Comment #1 from Yur
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
Bug #: 54939
Summary: Very poor vectorization of loops with complex
arithmetic
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
--- Comment #2 from Yuri Rumyantsev 2012-10-16
14:54:50 UTC ---
Created attachment 28455
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28455
test reproducer
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
--- Comment #3 from Yuri Rumyantsev 2012-10-16
15:06:19 UTC ---
I looked through the list of all issues related to vectorization but could not
find duplicate.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57124
Yuri Rumyantsev changed:
What|Removed |Added
CC||ysrumyan at gmail dot com
--- Comment
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
This is based on performance analysis of eembc2.0 suite on Atom processor in
comparison with clang compiler.
I prepared a simple reproducer that
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57430
--- Comment #1 from Yuri Rumyantsev ---
Created attachment 30203
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30203&action=edit
testcase
Need to compile with "-O2 -m32" options on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57430
--- Comment #2 from Yuri Rumyantsev ---
I don't believe that this is related to rtl optimizations, but rather to
inlining phase. To prove it I did small changes in t.c for remove.c (it now has
type void):
void remove (node ** head, node* elt)
{
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57430
--- Comment #3 from Yuri Rumyantsev ---
Created attachment 30213
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30213&action=edit
modified testcase
This is modified testcase which does not have a problem with redundant move
instruction in in
: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
We found significant performance drop after changes in lra phase, which can be
demonstrated on example from
http://gcc.gnu.org/bugzilla
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57468
--- Comment #1 from Yuri Rumyantsev ---
Created attachment 30224
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30224&action=edit
test-case to reproduce
It should be compiled on x86 with "-O2 -m32" options.
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
For the following simple test-case extracted from 254.gap (spec2000):
typedef unsigned long ul;
void foo (ul* __restrict x, ul* __restrict y, ul n)
{
ul i;
for (i=1; i<=n; i++, x++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57430
--- Comment #4 from Yuri Rumyantsev ---
I have not seen any comments about my latest note - have you any ideas about
this issue?
Thanks ahead.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57602
Yuri Rumyantsev changed:
What|Removed |Added
CC||ysrumyan at gmail dot com
--- Comment
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56129
Yuri Rumyantsev changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796
Yuri Rumyantsev changed:
What|Removed |Added
CC||ysrumyan at gmail dot com
--- Comment
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796
--- Comment #4 from Yuri Rumyantsev ---
(In reply to Jakub Jelinek from comment #3)
> By tuning I've meant the vectorizer cost model. If the desirability of
> gathers vs. no vectorization at all doesn't depend only on the insns in the
> loop, but
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954
Yuri Rumyantsev changed:
What|Removed |Added
CC||ysrumyan at gmail dot com
--- Comment
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57602
--- Comment #12 from Yuri Rumyantsev ---
Jan,
I tried to test your fix and got the following error message while
building trunk compiler (with your fix):
../../../../../trunk/libstdc++-v3/src/c++11/fstream-inst.cc:48:1:
error: node is alias but
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57602
--- Comment #14 from Yuri Rumyantsev ---
Hi Jan,
I checked that all benches from spec2000 are run successfully with
-flto options and eembc_2_0 suite was also run sucessfully with lto
(for 32-bit mode).
So go ahead and commit your fix.
Best re
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
If we consider the following simple test-case
int a[100];
void foo()
{
a[0] = a[1] = a[2] = a[3] = 0;
}
SLP vectorization of basic block takes place:
gcc -S -O3 -m32 t.c -ftree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #8 from Yuri Rumyantsev ---
Created attachment 30751
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30751&action=edit
modified test-case
Modified test-case to reproduce sub-optimal register allocation.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #9 from Yuri Rumyantsev ---
The issue still exists in 4.9 compiler but we got another 30% degradation after
r202165 fix. It can be reproduced with modified test-case which as attached
with any 4.9 compiler, namely code produced for inn
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
We also assume that arm can have the same problem at given benchmark if -flto
is
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58384
--- Comment #1 from Yuri Rumyantsev ---
Created attachment 30791
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30791&action=edit
test-case to reproduce
This is compile only test which must be compiled with pre-reload scheduler,
i.e.
with '-
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #10 from Yuri Rumyantsev ---
After fix rev. 202468 assembly looks slightly better but we met with another RA
inefficiency which can be illustrated on the attached (t1.c) test compiled with
options "-march=atom -mtune=atom -m32 -O2" tha
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #11 from Yuri Rumyantsev ---
Created attachment 30816
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30816&action=edit
test-case to reproduce
t1.c must be compiled on x86 with options:
-O2 -march=atom -mtune=atom -mfpmath=sse -m
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
We found out that phase loop distribution is responsible for it, namely wrong
cfg is generated (after ldist) for pdv.f if it was compiled with options
: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
We noticed significant performance regression on important bench from eembc2.0
suite which can be exhibit with attached test-case.
Assembly
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58459
--- Comment #1 from Yuri Rumyantsev ---
Created attachment 30850
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30850&action=edit
test-case to reproduce
Test must be compiled on x86 with options -Ofast -m332 -march=atom -mtune=atom
: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
To reproduce it is sufficient to compile only one source file -
StaggeredLeapfrog2.F which must be preprocessed. In rtl-dump after reload we
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58826
--- Comment #2 from Yuri Rumyantsev ---
In fact LRA is responsible for this failure - there is a bug in constant
regeneration. LRA correctly regenerates all occurrences of virtual register
which is not allocated(i.e. it does not has a register) bu
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
After fix for memcpy/memset expansion on x86 target we have met with ICE on the
following simple test:
void
my_memcpy (char *dest, const char *src, int n)
{
__builtin_memcpy (dest, src, n);
}
which
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58384
--- Comment #4 from Yuri Rumyantsev ---
I checked that the fix for PR58831 does not cure the issue, but we cab close
this bug since 253.perlbmk is passed now with pre-reload scheduler.
: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
After patch to improve register preferencing in IRA and to *remove regmove*
pass we noticed performance degradation on several benchmarks
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59036
--- Comment #1 from Yuri Rumyantsev ---
Created attachment 31178
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31178&action=edit
test-case to reproduce
test need to be compiled with -m32 option for any x86 targets.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57756
Yuri Rumyantsev changed:
What|Removed |Added
CC||ysrumyan at gmail dot com
--- Comment
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57756
--- Comment #7 from Yuri Rumyantsev ---
Created attachment 31217
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31217&action=edit
Additioanl patch for r203634.
See my comments.
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
I attached a simple reproducer to compile with -march=core-avx2 -c -O3
-ffast-math options (-ffast-math is essential) and got the following ICE:
t.c: In
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59133
--- Comment #1 from Yuri Rumyantsev ---
Created attachment 31219
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31219&action=edit
test-case to reproduce
Need to be compiled with -m32 -march=core-avx2 -O3 -ffast-math options to
reproduce ICE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59133
--- Comment #2 from Yuri Rumyantsev ---
It is worth noting that -m32 option is also essential for reproducing.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57756
--- Comment #9 from Yuri Rumyantsev ---
Hi Uros,
I decided that the bug owner should fix it and send my patch (or
modified one) for review to GCC community, i.e. I was not planning to
fix it. But if I should do it pls let me know and I send it to
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58826
--- Comment #6 from Yuri Rumyantsev ---
I assume that this bug should be closed since it is not reproducible after the
latest LRA fix.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58721
Yuri Rumyantsev changed:
What|Removed |Added
CC||ysrumyan at gmail dot com
--- Comment
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58721
--- Comment #4 from Yuri Rumyantsev ---
It turned out that proposed fix does not help trunk compilers since now another
huge routine is inlined firstly (read_input) and for perdida we got the
following message:
not inlinable: iztaccihuatl/17 -> p
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58721
--- Comment #5 from Yuri Rumyantsev ---
Created attachment 31348
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31348&action=edit
test-case to reproduce
It need to be compiled with -Ofast -flto options to reproduce.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58721
--- Comment #7 from Yuri Rumyantsev ---
I saw that on old compiler sources (dated by 20130911) with my patch 'perdida'
was inlined without any additional inline parameters (using -flto) but now it
does not inlined since another large function read
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
Bug #: 55342
Summary: [LRA,x86] Non-optimal code for simple loop with LRA
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: norma
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342
--- Comment #2 from Yuri Rumyantsev 2012-11-19
12:06:20 UTC ---
The patching compiler produces better binaries but we still have -6%
performance degradation on corei7. The main cause of it it that LRA compiler
generates spill of 'pure' byt
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731
Bug #: 55731
Summary: Issue with complete innermost loop unrolling
(cunrolli)
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731
--- Comment #1 from Yuri Rumyantsev 2012-12-18
14:23:30 UTC ---
Created attachment 28997
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28997
testcase1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731
--- Comment #2 from Yuri Rumyantsev 2012-12-18
14:24:05 UTC ---
Created attachment 28998
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28998
testcase2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731
--- Comment #4 from Yuri Rumyantsev 2012-12-19
09:17:40 UTC ---
(In reply to comment #3)
> The reason is that unrolling early can be harmful to for example vectorization
> and thus cunrolli restricts itself to "obviously" profitable cases.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55970
Bug #: 55970
Summary: [x86] Avoid reverse order of function argument
gimplifying
Classification: Unclassified
Product: gcc
Version: unknown
Status: UNCONFIR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55970
--- Comment #1 from Yuri Rumyantsev 2013-01-14
14:46:48 UTC ---
Created attachment 29162
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29162
testcase
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55970
--- Comment #3 from Yuri Rumyantsev 2013-01-14
15:15:52 UTC ---
I pointed out that this code is not C standard compliant but it occurred in
customer application that should be ported to x86 platform. This bug is not
issued by gcc and very
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787
--- Comment #14 from Yuri Rumyantsev 2013-01-22
15:32:06 UTC ---
Created attachment 29250
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29250
testcase in F90
Reproducer for IPA_CP
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787
Yuri Rumyantsev changed:
What|Removed |Added
CC||ysrumyan at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56129
Bug #: 56129
Summary: Seg fault on 256.bzip2 from spec2000 with -lto and
pre-reload scheduler for x86 Atom
Classification: Unclassified
Product: gcc
Version: 4.8.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151
Bug #: 56151
Summary: Performance degradation after r194054 on x86 Atom.
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
1 - 100 of 309 matches
Mail list logo