[Bug target/95285] AArch64:aarch64 medium code model proposal

2020-12-10 Thread wdijkstr at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95285 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #16 from

[Bug tree-optimization/94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-04-06 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94442 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug tree-optimization/91322] [10 regression] alias-4 test failure

2020-04-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91322 --- Comment #10 from Wilco --- (In reply to Christophe Lyon from comment #6) > Created attachment 48184 [details] > GCC passes dumps So according to that, in 105t.vrp1 it removes the branch and unconditionally calls abort: Folding statement:

[Bug tree-optimization/91322] [10 regression] alias-4 test failure

2020-03-26 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91322 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug rtl-optimization/94026] combine missed opportunity to simplify comparisons with zero

2020-03-20 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94026 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #5 from

[Bug rtl-optimization/93565] Combine duplicates count trailing zero instructions

2020-02-06 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #6 from

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-10-10 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #27 from

[Bug target/91766] -fvisibility=hidden during -fpic still uses GOT indirection on arm64

2019-09-14 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91766 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #4 from

[Bug middle-end/91753] Bad register allocation of multi-register types

2019-09-12 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91753 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug fortran/91690] Slow IEEE intrinsics

2019-09-09 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91690 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #6 from

[Bug middle-end/66462] GCC isinf/isnan/... builtins cause sNaN exceptions

2019-09-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66462 --- Comment #12 from Wilco --- (In reply to Segher Boessenkool from comment #11) > I currently have > > === > diff --git a/gcc/builtins.c b/gcc/builtins.c > index ad5135c..bc3d318 100644 > --- a/gcc/builtins.c > +++ b/gcc/builtins.c > @@

[Bug tree-optimization/91144] [10 regiression] 176.gcc miscompare after r273294

2019-07-11 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91144 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug tree-optimization/90838] Detect table-based ctz implementation

2019-06-11 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug target/85669] fail on s-case-cfn-macros: build/gencfn-macros: DEF_INTERNAL_FLT/INT_FN (%smth%) has no associated built-in functions

2018-10-25 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85669 --- Comment #43 from Wilco --- (In reply to Douglas Mencken from comment #42) > (In reply to Wilco from comment #41) > > > So what is the disassembly now? > > $ /Developer/GCC/8.2p/PowerPC/32bit/bin/gcc -O2 -fno-inline pr78468.c > -save-temps

[Bug target/85669] fail on s-case-cfn-macros: build/gencfn-macros: DEF_INTERNAL_FLT/INT_FN (%smth%) has no associated built-in functions

2018-10-25 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85669 --- Comment #41 from Wilco --- (In reply to Douglas Mencken from comment #40) > To build it, I patched its sources with fix_gcc8_build.patch reversion > together with changes from comment #16 So what is the disassembly now? The 2nd diff still

[Bug target/85669] fail on s-case-cfn-macros: build/gencfn-macros: DEF_INTERNAL_FLT/INT_FN (%smth%) has no associated built-in functions

2018-10-25 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85669 --- Comment #38 from Wilco --- (In reply to Douglas Mencken from comment #37) > And some more in my wish list. May GCC don’t generate these > > .align2 > > in text section? Any, each and every powerpc instruction is 32bit-wide, no >

[Bug target/85669] fail on s-case-cfn-macros: build/gencfn-macros: DEF_INTERNAL_FLT/INT_FN (%smth%) has no associated built-in functions

2018-10-25 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85669 --- Comment #33 from Wilco --- (In reply to Iain Sandoe from comment #30) > From "Mac_OS_X_ABI_Function_Calls.pdf" > > m32 calling convention > > Prologs and Epilogs > The called function is responsible for allocating its own stack frame, >

[Bug target/85669] fail on s-case-cfn-macros: build/gencfn-macros: DEF_INTERNAL_FLT/INT_FN (%smth%) has no associated built-in functions

2018-10-25 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85669 --- Comment #32 from Wilco --- (In reply to Segher Boessenkool from comment #29) > It aligns the stack to 16: > > # r3 is size, at entry > addi r3,r3,18 > ... > rlwinm r3,r3,0,0,27 > ... > neg r3,r3 >

[Bug target/85669] fail on s-case-cfn-macros: build/gencfn-macros: DEF_INTERNAL_FLT/INT_FN (%smth%) has no associated built-in functions

2018-10-25 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85669 --- Comment #26 from Wilco --- (In reply to Douglas Mencken from comment #25) > (In reply to Wilco from comment #24) > > > Yes the stage1 compiler would be fine or alternatively use > > --disable-bootstrap to get an installed compiler. > > I’m

[Bug target/85669] fail on s-case-cfn-macros: build/gencfn-macros: DEF_INTERNAL_FLT/INT_FN (%smth%) has no associated built-in functions

2018-10-25 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85669 --- Comment #24 from Wilco --- (In reply to Douglas Mencken from comment #22) > (In reply to Wilco from comment #21) > > > That's odd. The stack pointer is definitely 16-byte aligned in all cases > > right? > > As I know, PowerPC has no

[Bug target/85669] fail on s-case-cfn-macros: build/gencfn-macros: DEF_INTERNAL_FLT/INT_FN (%smth%) has no associated built-in functions

2018-10-25 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85669 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #21 from

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #23 from

[Bug tree-optimization/84114] global reassociation pass prevents fma usage, generates slower code

2018-02-10 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114 --- Comment #3 from Wilco --- (In reply to Richard Biener from comment #1) > This is probably related to targetm.sched.reassociation_width where reassoc > will widen a PLUS chain so several instructions will be executable in > parallel > without

[Bug tree-optimization/84114] global reassociation pass prevents fma usage, generates slower code

2018-02-04 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug middle-end/84071] [7/8 regression] wrong elimination of zero-extension after sign-extended load

2018-01-30 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84071 --- Comment #8 from Wilco --- (In reply to Eric Botcazou from comment #6) > > They are always written but have an undefined value. Adding 2 8-bit values > > results in a 9-bit value with WORD_REGISTER_OPERATIONS. > > If they have an undefined

[Bug rtl-optimization/81443] [8 regression] build/genrecog.o: virtual memory exhausted: Cannot allocate memory

2018-01-27 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81443 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #21 from

[Bug middle-end/84071] [7/8 regression] nonzero_bits1 of subreg incorrect

2018-01-27 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84071 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 --- Comment #18 from Wilco --- (In reply to Qing Zhao from comment #17) > (In reply to Wilco from comment #16) > > >> const char s[8] = “abcd\0abc”; // null byte in the middle of the string > >> int f2(void) { return __builtin_strcmp(s, "abc")

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-24 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 --- Comment #16 from Wilco --- (In reply to Qing Zhao from comment #15) > (In reply to Wilco from comment 14) > > The only reason we have to do a character by character comparison is > > because we > > cannot read beyond the end of a string.

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-23 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 --- Comment #9 from Wilco --- (In reply to Qing Zhao from comment #7) str(n)cmp with a constant string can be changed into memcmp if the string has a known alignment or is an array of known size. We should check the common cases are

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-23 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 --- Comment #8 from Wilco --- > /home/qinzhao/Install/latest/bin/gcc -O2 t_p_1.c t_p.c > non-inlined version > 20.84user 0.00system 0:20.83elapsed 100%CPU (0avgtext+0avgdata > 360maxresident)k > 0inputs+0outputs (0major+135minor)pagefaults

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-13 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #4 from

[Bug middle-end/82479] missing popcount builtin detection

2017-10-09 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82479 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #6 from

[Bug target/82439] Missing (x | y) == x simplifications

2017-10-05 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82439 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug target/81357] Extra mov for zero extend of add

2017-09-28 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81357 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #7 from

[Bug middle-end/78468] [8 regression] libgomp.c/reduction-10.c and many more FAIL

2017-09-06 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78468 --- Comment #40 from Wilco --- (In reply to Eric Botcazou from comment #39) > > The existing alloca code relies on STACK_BOUNDARY being set correctly. Has > > the value been fixed already for the OS variants mentioned? If stack > > alignment

[Bug middle-end/78468] [8 regression] libgomp.c/reduction-10.c and many more FAIL

2017-09-06 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78468 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #38 from

[Bug target/71951] libgcc_s built with -fomit-frame-pointer on aarch64 is broken

2017-07-27 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71951 --- Comment #11 from Wilco --- (In reply to Icenowy Zheng from comment #10) > In my environment (glibc 2.25, and both the building scripts of glibc and > gcc have -fomit-frame-pointer automatically enabled), this bug is not fully > resolved yet.

[Bug target/71951] libgcc_s built with -fomit-frame-pointer on aarch64 is broken

2017-04-13 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71951 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #8 from

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-16 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484 --- Comment #31 from Wilco --- (In reply to Jan Hubicka from comment #30) > > > > When I looked at gap at the time, the main change was the reordering of a > > few > > if statements in several hot functions. Incorrect block frequencies also >

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-16 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484 --- Comment #29 from Wilco --- (In reply to Jan Hubicka from comment #28) > > On SPEC2000 the latest changes look good, compared to the old predictor gap > > improved by 10% and INT/FP by 0.8%/0.6%. I'll run SPEC2006 tonight. > > It is rather

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-10-31 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308 --- Comment #32 from Wilco --- (In reply to Bernd Edlinger from comment #31) > Sure, combine cant help, especially because it runs before split1. > > But I wondered why this peephole2 is not enabled: > > (define_peephole2 ; ldrd > [(set

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-10-25 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308 --- Comment #14 from Wilco --- (In reply to Bernd Edlinger from comment #13) > I am still trying to understand why thumb1 seems to outperform thumb2. > > Obviously thumb1 does not have the shiftdi3 pattern, > but even if I remove these from

[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0

2016-10-21 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #11 from Wilco --- (In reply to ktkachov from comment #10) > Confirmed then. Wilco, if you're working on this can you please assign it to > yourself? Unfortunately the form doesn't allow me to do anything with the headers...

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-10-20 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308 --- Comment #12 from Wilco --- It looks like we need a different approach, I've seen the extra SETs use up more registers in some cases, and in other cases being optimized away early on... Doing shift expansion at the same time as all other DI

[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0

2016-10-20 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #8 from Wilco --- (In reply to Bernd Edlinger from comment #7) > (In reply to Richard Earnshaw from comment #6) > > (In reply to Bernd Edlinger from comment #5) > > > (In reply to Wilco from comment #4) > > > > However dealing with

[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0

2016-10-19 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #4 from Wilco --- (In reply to Bernd Edlinger from comment #3) > (In reply to Wilco from comment #2) > > (In reply to Bernd Edlinger from comment #1) > > > some background about this bug can be found here: > > > > > >

[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0

2016-10-19 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug tree-optimization/32650] Convert p+strlen(p) to strchr(p, '\0') if profitable

2016-09-28 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32650 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug middle-end/77580] New: Improve devirtualization

2016-09-13 Thread wdijkstr at arm dot com
Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- A commonly used benchmark contains a hot loop which calls one of 2 virtual functions via a static variable which is set just before. A reduced example is: int f1(int x) { return x + 1; } int f2

[Bug middle-end/77568] [7 regression] CSE/PRE/Hoisting blocks common instruction contractions

2016-09-12 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77568 --- Comment #5 from Wilco --- (In reply to Andrew Pinski from comment #2) > Note there are two different issues here. Well they are 3 examples of the same underlying issue - don't do a CSE when it's not profitable. How they are resolved might

[Bug middle-end/77568] [7 regression] CSE/PRE/Hoisting blocks common instruction contractions

2016-09-12 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77568 --- Comment #3 from Wilco --- (In reply to Andrew Pinski from comment #1) > I think this is just a pass ordering issue. We create fmas after PRE. > Maybe we should do it both before and after ... > Or enhance the pass which produces FMA to

[Bug middle-end/77568] New: [7 regression] CSE/PRE/Hoisting blocks common instruction contractions

2016-09-12 Thread wdijkstr at arm dot com
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- The recently introduced code hoisting aggressively moves common subexpressions that might otherwise be mergeable with other

[Bug tree-optimization/65068] Improve rewriting for address type induction variables in IVOPT

2016-09-08 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65068 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #3 from

[Bug middle-end/77484] New: Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-09-05 Thread wdijkstr at arm dot com
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- Changes in the static branch predictor (around August last year) caused regressions on SPEC2000. The PRED_CALL predictor causes GAP

[Bug tree-optimization/66946] Spurious uninitialized warning

2016-09-05 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66946 Wilco changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|---

[Bug target/77455] [AArch64] eh_return implementation fails

2016-09-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77455 Wilco changed: What|Removed |Added Target||AArch64 Known to fail|

[Bug target/77455] New: [AArch64] eh_return implementation fails

2016-09-02 Thread wdijkstr at arm dot com
Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- The __builtin_eh_return implementation on AArch64 generates incorrect code for many cases due to using an incorrect offset/pointer when writing the new return address to the stack

[Bug tree-optimization/71026] Missing division optimizations

2016-08-24 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026 --- Comment #3 from Wilco --- (In reply to ktkachov from comment #2) > The transforms > > int f4(float x) { return (1.0f / x) < 0.0f; } // -> x < 0.0f > int f5(float x) { return (x / 2.0f) <= 0.0f; }// -> x <= 0.0f > > can be

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-08-23 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #10 from

[Bug rtl-optimization/69847] Spec 2006 403.gcc slows down with -mlra vs. reload on PowerPC

2016-08-23 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69847 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #27 from

[Bug middle-end/71443] New: [7 regression] test case gcc.dg/plugin/must-tail-call-2.c reports error

2016-06-07 Thread wdijkstr at arm dot com
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- There are 2 new failures in the tail-call-2.c test on recent trunk builds: FAIL: gcc.dg/plugin/must-tail-call-2.c -fplugin

[Bug rtl-optimization/71022] GCC prefers register moves over move immediate

2016-05-10 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71022 --- Comment #2 from Wilco --- (In reply to Richard Biener from comment #1) > IRA might choose to do this as part of life-range splitting/shortening. Note > that reg-reg moves may be cheaper code-size wise (like on CISC archs with > non-fixed

[Bug tree-optimization/71026] New: Missing division optimizations

2016-05-09 Thread wdijkstr at arm dot com
Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- With -Ofast GCC doesn't reassociate constant multiplies or negates away from divisors to allow for more reciprocal division optimizations. It is also possible to avoid divisions

[Bug rtl-optimization/71022] New: GCC prefers register moves over move immediate

2016-05-09 Thread wdijkstr at arm dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- When assigning the same immediate value to different registers, GCC will always CSE the immediate and emit a register move for subsequent uses. This creates

[Bug rtl-optimization/70961] Regrename ignores preferred_rename_class

2016-05-06 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70961 --- Comment #5 from Wilco --- As for a simple example, Proc_4 in Dhrystone is a good one. With -O2 and -fno-rename-registers I get the following on Thumb-2: 00c8 : c8: b430push{r4, r5} ca: f240 0300 movwr3,

[Bug rtl-optimization/70961] Regrename ignores preferred_rename_class

2016-05-06 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70961 --- Comment #3 from Wilco --- (In reply to Eric Botcazou from comment #2) > Pass #2 ignores it since the preference simply couldn't be honored. In which case it should not rename that chain rather than just ignore the preference (and a

[Bug rtl-optimization/70961] New: Regrename ignores preferred_rename_class

2016-05-05 Thread wdijkstr at arm dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- When deciding which register to use regrename.c calls the target function preferred_rename_class. However in pass 2 in find_rename_reg it then just ignores this preference

[Bug rtl-optimization/70946] Bad interaction between IVOpt and loop unrolling

2016-05-04 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70946 --- Comment #1 from Wilco --- PR36712 seems related to this

[Bug rtl-optimization/70946] New: Bad interaction between IVOpt and loop unrolling

2016-05-04 Thread wdijkstr at arm dot com
: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- IVOpt chooses between using indexing for induction variables or incrementing pointers. Due to way loop unrolling works, a decision that is optimal if unrolling

[Bug middle-end/70861] Improve code generation of switch tables

2016-04-28 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70861 --- Comment #3 from Wilco --- (In reply to Andrew Pinski from comment #2) > Note I think if we had gotos instead of assignment here we should do the > similar thing for the switch table itself. Absolutely, that was my point. > Note also the

[Bug middle-end/70861] New: Improve code generation of switch tables

2016-04-28 Thread wdijkstr at arm dot com
-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- GCC uses a very basic check to determine whether to use a switch table. A simple example from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11823 still generates a huge table

[Bug middle-end/70802] New: IRA memory cost calculation incorrect for immediates

2016-04-26 Thread wdijkstr at arm dot com
Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- The following code in ira-costs.c tries to improve the memory cost for rematerializeable loads. There are several issues with this though: 1. The memory cost can

[Bug middle-end/70801] New: IRA caller-saves does not support rematerialization

2016-04-26 Thread wdijkstr at arm dot com
: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- GCC emits the same code for caller-saves in all cases, even if the caller-save is an immediate which can be trivially rematerialized. The caller-save code should

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-19 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048 --- Comment #20 from Wilco --- (In reply to Richard Henderson from comment #19) > I wish that message had been a bit more complete with the description > of the performance issue. I must guess from this... > > > ldr dst1, [reg_base1,

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-11 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048 --- Comment #17 from Wilco --- (In reply to Jiong Wang from comment #16) > * for the second patch at #c10, if we always do the following no matter > op0 is virtual & eliminable or not > > "op1 = force_operand (op1, NULL_RTX);" >

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-10 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048 --- Comment #15 from Wilco --- (In reply to Richard Biener from comment #14) > The regression in the original description looks severe enough to warrant > some fixing even if regressing some other cases. Agreed, I think the improvement from

[Bug middle-end/70140] New: Inefficient expansion of __builtin_mempcpy

2016-03-08 Thread wdijkstr at arm dot com
-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- The expansion of __builtin_mempcpy is inefficient on many targets (eg. AArch64, ARM, PPC). The issue is due to not using the same expansion options that memcpy uses

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-07 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048 --- Comment #12 from Wilco --- (In reply to Jiong Wang from comment #11) > (In reply to Richard Henderson from comment #10) > > Created attachment 37890 [details] > > second patch > > > > Still going through full testing, but I wanted to post

[Bug testsuite/70055] gcc.target/i386/chkp-stropt-16.c is incompatible with glibc 2.23

2016-03-04 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70055 --- Comment #9 from Wilco --- (In reply to H.J. Lu from comment #8) > Inlining mempcpy uses a callee-saved register: > ... > > Not inlining mempcpy is preferred. If codesize is the only thing that matters... The cost is not at the caller side

[Bug testsuite/70055] gcc.target/i386/chkp-stropt-16.c is incompatible with glibc 2.23

2016-03-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70055 --- Comment #6 from Wilco --- (In reply to Jakub Jelinek from comment #4) > Note the choice of this in a header file is obviously wrong, if you at some > point fix this up, then apps will still call memcpy rather than mempcpy, > even when the

[Bug testsuite/70055] gcc.target/i386/chkp-stropt-16.c is incompatible with glibc 2.23

2016-03-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70055 --- Comment #5 from Wilco --- (In reply to Jakub Jelinek from comment #3) > If some arch in glibc implements memcpy.S and does not implement mempcpy.S, > then obviously the right fix is to add mempcpy.S for that arch, usually it > is just a

[Bug testsuite/70055] gcc.target/i386/chkp-stropt-16.c is incompatible with glibc 2.23

2016-03-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70055 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048 --- Comment #5 from Wilco --- (In reply to amker from comment #4) > (In reply to ktkachov from comment #3) > > Started with r233136. > > That's why I forced base+offset out of memory reference and kept register > scaling in in the first place.

[Bug target/70048] [AArch64] Inefficient local array addressing

2016-03-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048 --- Comment #1 from Wilco --- The regression seem to have appeared on trunk around Feb 3-9.

[Bug target/70048] New: [AArch64] Inefficient local array addressing

2016-03-02 Thread wdijkstr at arm dot com
: target Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- The following example generates very inefficient code on AArch64: int f1(int i) { int p[1000]; p[i] = 1; return p[i + 10] + p[i + 20]; } f1: sub sp, sp, #4000

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-17 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 --- Comment #41 from Wilco --- (In reply to Jerry DeLisle from comment #40) > Do you have a reduced test case of the Fortran code we can look at? See comment 13/14, the same common array is declared with different sizes in various places. > I

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-05 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 --- Comment #29 from Wilco --- (In reply to rguent...@suse.de from comment #28) > On Fri, 5 Feb 2016, alalaw01 at gcc dot gnu.org wrote: > > Should I raise a new bug for this, as both this and 53068 are CLOSED? > > I think this has been

[Bug c++/69657] [6 Regression] abs() not inlined after including math.h

2016-02-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69657 --- Comment #5 from Wilco --- (In reply to Andrew Pinski from comment #4) > (In reply to Jonathan Wakely from comment #3) > > Recategorising as component=c++, and removing the regression marker (because > > the change in libstdc++ that reveals

[Bug libstdc++/69657] New: [6 Regression] abs() not inlined after including math.h

2016-02-03 Thread wdijkstr at arm dot com
Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- Since a recent C++ header change abs() no longer gets inlined if we include an unrelated header before it. #include #include int wrap_abs

[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion

2016-02-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619 --- Comment #5 from Wilco --- Proposed patch: https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00206.html

[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion

2016-02-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619 --- Comment #2 from Wilco --- Changing to c = 3 generates code after a short time. The issue is recursive calls to expand_ccmp_expr during the 2 possible options tried to determine costs. That makes the algorithm exponential. A fix would be to

[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion

2016-02-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619 --- Comment #3 from Wilco --- A simple workaround is to calculate cost1 early and only try the 2nd option if the cost is low (ie. it's not a huge expression that may evaluate into lots of ccmps). A slightly more advanced way would be to walk

[Bug tree-optimization/69368] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-01 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug tree-optimization/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-01 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 --- Comment #5 from Wilco --- This still fails on AArch64 in exactly the same way with latest trunk - can someone reopen this? I don't seem to have the right permissions... (In reply to Richard Biener from comment #4) > So - can you please

[Bug tree-optimization/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-01 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 --- Comment #8 from Wilco --- In a few functions GCC decides that the assignments in loops are redundant. The loops still execute but have their loads and stores removed. Eg. the first DO loop in MP2NRG should be: .L1027:

[Bug tree-optimization/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-01 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 --- Comment #6 from Wilco --- This still fails on AArch64 in exactly the same way with latest trunk - can someone reopen this? I don't seem to have the right permissions... (In reply to Richard Biener from comment #4) > So - can you please

[Bug tree-optimization/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-01 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 --- Comment #9 from Wilco --- The loops get optimized away in dom2. The info this phase emits is hard to figure out, so it's not obvious why it thinks the array assignments are redundant (the array is used all over the place so clearly cannot be

[Bug tree-optimization/69336] Constant value not detected

2016-01-29 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69336 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #13 from

[Bug target/69416] [6 Regression] Nonsense rtl checking failure

2016-01-21 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69416 --- Comment #2 from Wilco --- Started looking at this- it looks like line 1833 in emit-rtl.c gets miscompiled in combine: (insn 397 389 394 38 (set (reg:SI 462) (const_int 29 [0x1d])) ./emit-rtl.c:1833 49 {*movsi_aarch64} (nil))

  1   2   >