[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-10 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 Wilco wdijkstr at arm dot com changed: What|Removed |Added CC||wdijkstr at arm dot com

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-21 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #10 from Wilco wdijkstr at arm dot com --- The loops shown are not the correct inner loops for those options - with -ffast-math they are vectorized. LLVM unrolls 2x but GCC doesn't. So the question is why GCC doesn't unroll vectorized

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-22 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #13 from Wilco wdijkstr at arm dot com --- (In reply to Andrew Pinski from comment #11) (In reply to Wilco from comment #10) The loops shown are not the correct inner loops for those options - with -ffast-math they are vectorized

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-22 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #15 from Wilco wdijkstr at arm dot com --- (In reply to Evandro Menezes from comment #14) Compiling the test-case above with just -O2, I can reproduce the code I mentioned initially and easily measure the cycle count to run

[Bug target/61915] [AArch64] High amounts of GP to FP register moves using LRA on AArch64

2014-10-22 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915 Wilco wdijkstr at arm dot com changed: What|Removed |Added CC||wdijkstr at arm dot com

[Bug target/61915] [AArch64] High amounts of GP to FP register moves using LRA on AArch64

2014-10-22 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915 --- Comment #10 from Wilco wdijkstr at arm dot com --- (In reply to Andrew Pinski from comment #2) https://gcc.gnu.org/ml/gcc/2014-05/msg00160.html Note currently it is not possible to use FP registers for spilling using the hooks - basically

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-22 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #19 from Wilco wdijkstr at arm dot com --- (In reply to Evandro from comment #16) (In reply to Wilco from comment #15) Using -Ofast is not any different from -O3 -ffast-math when compiling non-Fortran code. As comment 10 shows

[Bug target/61915] [AArch64] High amounts of GP to FP register moves using LRA on AArch64

2014-10-24 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915 --- Comment #15 from Wilco wdijkstr at arm dot com --- (In reply to Evandro from comment #12) (In reply to Evandro from comment #11) Do you have an idea of the performance impact of this patch? At least in Dhrystone, it improved by over 2

[Bug target/61915] [AArch64] High amounts of GP to FP register moves using LRA on AArch64

2014-10-24 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915 --- Comment #16 from Wilco wdijkstr at arm dot com --- (In reply to Andrew Pinski from comment #13) (In reply to Wilco from comment #9) I committed a workaround (http://gcc.gnu.org/ml/gcc-patches/2014-09/msg00362.html) by increasing

[Bug target/61915] [AArch64] High amounts of GP to FP register moves using LRA on AArch64

2014-10-27 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915 --- Comment #18 from Wilco wdijkstr at arm dot com --- (In reply to Andrew Pinski from comment #17) (In reply to Wilco from comment #16) (In reply to Andrew Pinski from comment #13) (In reply to Wilco from comment #9) I committed

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-28 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #22 from Wilco wdijkstr at arm dot com --- (In reply to Evandro from comment #21) (In reply to ramana.radhakrish...@arm.com from comment #20) What's the kind of performance delta you see if you managed to unroll the loop just

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-28 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #24 from Wilco wdijkstr at arm dot com --- (In reply to Evandro from comment #23) (In reply to Wilco from comment #22) Unrolling alone isn't good enough in sum reductions. As I mentioned before, GCC doesn't enable any

[Bug target/61915] [AArch64] High amounts of GP to FP register moves using LRA on AArch64 - Improve Generic register_move_cost and memory_move_cost

2014-11-19 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915 Wilco wdijkstr at arm dot com changed: What|Removed |Added Status|ASSIGNED|RESOLVED

[Bug middle-end/60580] aarch64 generates wrong code for __attribute__ ((optimize(no-omit-frame-pointer)))

2014-11-20 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60580 Wilco wdijkstr at arm dot com changed: What|Removed |Added CC||wdijkstr at arm dot com

[Bug rtl-optimization/64151] [5 Regression] r218266 caused many regressions

2014-12-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64151 --- Comment #2 from Wilco wdijkstr at arm dot com --- (In reply to H.J. Lu from comment #1) Revert the reg_class change: diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c index 72c00cc..16fd6e8 100644 --- a/gcc/ira-costs.c +++ b/gcc/ira

[Bug rtl-optimization/64156] Subversion id 218266 breaks the big-endian 64-bit PowerPC build (wilco.dijks...@arm.com's mod to ira-costs.c)

2014-12-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64156 --- Comment #3 from Wilco wdijkstr at arm dot com --- (In reply to Michael Meissner from comment #2) Note, the fix proposed in PR64151 DOES NOT work on the PowerPC, so it may be a dup in terms of what change broke the build, but the potential

[Bug rtl-optimization/64242] New: Longjmp expansion incorrect on i386

2014-12-09 Thread wdijkstr at arm dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com As PR rtl-optimization/64151 showed, the longjmp expansion on i386 is incorrect if the base register is spilled. It turns out it is trivial to write an example that reproduces this without my patch

[Bug rtl-optimization/64151] [5 Regression] r218266 caused many regressions

2014-12-09 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64151 --- Comment #7 from Wilco wdijkstr at arm dot com --- See PR rtl-optimization/64242 for the longjmp issue on i386.

[Bug rtl-optimization/64242] Longjmp expansion incorrect on i386

2014-12-09 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64242 --- Comment #3 from Wilco wdijkstr at arm dot com --- (In reply to H.J. Lu from comment #2) Dup of PR 59039? No that talks about not using __builtin_setjmp and __builtin_longjmp within the same function. I only used longjmp. Or are they so

[Bug rtl-optimization/65862] [MIPS] IRA/LRA issue: integers spilled to floating-point registers

2015-04-27 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65862 --- Comment #4 from Wilco wdijkstr at arm dot com --- (In reply to Vladimir Makarov from comment #3) But I can not just revert the patch making ALL_REGS available to make coloring heuristic more fotunate for your particular case, as it reopens

[Bug rtl-optimization/65862] [MIPS] IRA/LRA issue: integers spilled to floating-point registers

2015-05-14 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65862 --- Comment #13 from Wilco wdijkstr at arm dot com --- (In reply to Vladimir Makarov from comment #9) Created attachment 35503 [details] ira-hook.patch Here is the patch. Could you try it and give me your opinion about it. Thanks. I

[Bug middle-end/66462] New: GCC isinf/isnan/... builtins cause sNaN exceptions

2015-06-08 Thread wdijkstr at arm dot com
: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- The isinf, isnan, isnormal, isfinite, fpclassify and signbit builtins use FP arithmetic to compute their result even with -fsignaling-nans (signbit only when -ffast-math

[Bug middle-end/66462] GCC isinf/isnan/... builtins cause sNaN exceptions

2015-06-08 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66462 --- Comment #1 from Wilco wdijkstr at arm dot com --- Note when this is fixed, GLIBC math/math.h should be updated to enable the isinf builtins even with -fsignaling-nans.

[Bug target/63304] Aarch64 pc-relative load offset out of range

2015-07-20 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304 Wilco wdijkstr at arm dot com changed: What|Removed |Added CC||wdijkstr at arm dot com

[Bug tree-optimization/66946] New: Spurious uninitialized warning

2015-07-20 Thread wdijkstr at arm dot com
Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- Created attachment 36016 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36016action=edit preprocessed iso-2022-cn-ext.c Since recently (around May) GCC6 has started to emit

[Bug tree-optimization/66946] Spurious uninitialized warning

2015-07-21 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66946 --- Comment #4 from Wilco wdijkstr at arm dot com --- (In reply to Andrew Pinski from comment #2) Comment on attachment 36021 [details] minimal example written == ((wchar_t) 0xfffd) Will ever be true or is there some sign extending going

[Bug tree-optimization/66946] Spurious uninitialized warning

2015-07-21 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66946 --- Comment #1 from Wilco wdijkstr at arm dot com --- Created attachment 36021 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36021action=edit minimal example Minimal example which still reports the spurious warning.

[Bug target/63304] Aarch64 pc-relative load offset out of range

2015-11-06 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304 --- Comment #33 from Wilco --- (In reply to Evandro from comment #32) > (In reply to Ramana Radhakrishnan from comment #31) > > (In reply to Evandro from comment #30) > > > The performance impact of always referring to constants as if they were

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-07 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #9 from Wilco --- (In reply to Andrew Pinski from comment #8) > (In reply to Wilco from comment #7) > > > > I think the problem is the constraints on *add3_pluslong allows > > > > all immediates. > > > > > > I'm not sure what you

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-08 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #15 from Wilco --- (In reply to Richard Henderson from comment #14) > (In reply to Wilco from comment #12) > > The only remaining question I had whether it would be possible to use > > peephole expansions rather than the late splits.

[Bug middle-end/71443] New: [7 regression] test case gcc.dg/plugin/must-tail-call-2.c reports error

2016-06-07 Thread wdijkstr at arm dot com
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- There are 2 new failures in the tail-call-2.c test on recent trunk builds: FAIL: gcc.dg/plugin/must-tail-call-2.c -fplugin

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-08 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #12 from Wilco --- (In reply to Wilco from comment #11) > With your patch expand always emits add instructions with complex immediates > which then can't be optimized. OK, so I can change your patch do the right thing with 2 minor

[Bug target/69176] [6 Regression] ICE in in final_scan_insn, at final.c:2981 on aarch64-linux-gnu

2016-01-08 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69176 --- Comment #11 from Wilco --- (In reply to Richard Henderson from comment #10) > Created attachment 37267 [details] > proposed patch > > Andrew is exactly right re plus being special. > > The pluslong hoops that are being jumped through are

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-05 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 --- Comment #29 from Wilco --- (In reply to rguent...@suse.de from comment #28) > On Fri, 5 Feb 2016, alalaw01 at gcc dot gnu.org wrote: > > Should I raise a new bug for this, as both this and 53068 are CLOSED? > > I think this has been

[Bug tree-optimization/69368] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-01 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug tree-optimization/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-01 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 --- Comment #5 from Wilco --- This still fails on AArch64 in exactly the same way with latest trunk - can someone reopen this? I don't seem to have the right permissions... (In reply to Richard Biener from comment #4) > So - can you please

[Bug tree-optimization/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-01 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 --- Comment #8 from Wilco --- In a few functions GCC decides that the assignments in loops are redundant. The loops still execute but have their loads and stores removed. Eg. the first DO loop in MP2NRG should be: .L1027:

[Bug c++/69657] [6 Regression] abs() not inlined after including math.h

2016-02-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69657 --- Comment #5 from Wilco --- (In reply to Andrew Pinski from comment #4) > (In reply to Jonathan Wakely from comment #3) > > Recategorising as component=c++, and removing the regression marker (because > > the change in libstdc++ that reveals

[Bug tree-optimization/69336] Constant value not detected

2016-01-29 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69336 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #13 from

[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion

2016-02-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619 --- Comment #2 from Wilco --- Changing to c = 3 generates code after a short time. The issue is recursive calls to expand_ccmp_expr during the 2 possible options tried to determine costs. That makes the algorithm exponential. A fix would be to

[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion

2016-02-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619 --- Comment #3 from Wilco --- A simple workaround is to calculate cost1 early and only try the 2nd option if the cost is low (ie. it's not a huge expression that may evaluate into lots of ccmps). A slightly more advanced way would be to walk

[Bug libstdc++/69657] New: [6 Regression] abs() not inlined after including math.h

2016-02-03 Thread wdijkstr at arm dot com
Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- Since a recent C++ header change abs() no longer gets inlined if we include an unrelated header before it. #include #include int wrap_abs

[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion

2016-02-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619 --- Comment #5 from Wilco --- Proposed patch: https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00206.html

[Bug tree-optimization/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-01 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 --- Comment #6 from Wilco --- This still fails on AArch64 in exactly the same way with latest trunk - can someone reopen this? I don't seem to have the right permissions... (In reply to Richard Biener from comment #4) > So - can you please

[Bug target/69416] [6 Regression] Nonsense rtl checking failure

2016-01-21 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69416 --- Comment #2 from Wilco --- Started looking at this- it looks like line 1833 in emit-rtl.c gets miscompiled in combine: (insn 397 389 394 38 (set (reg:SI 462) (const_int 29 [0x1d])) ./emit-rtl.c:1833 49 {*movsi_aarch64} (nil))

[Bug target/69416] [6 Regression] Nonsense rtl checking failure

2016-01-21 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69416 --- Comment #6 from Wilco --- (In reply to Andrew Pinski from comment #4) > Actually I think the problem is (const_int 8 [0x8]) does that make sense > for CC mode? I don't think it does. It should make sense as a CCmode immediate. It relies

[Bug target/69416] [6 Regression] Nonsense rtl checking failure

2016-01-21 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69416 --- Comment #7 from Wilco --- (In reply to Richard Henderson from comment #5) > Created attachment 37419 [details] > proposed patch > > I'm testing the following, but it does produce correct results > on a spot check of emit-rtl.c:1833. Yes,

[Bug fortran/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-17 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 --- Comment #41 from Wilco --- (In reply to Jerry DeLisle from comment #40) > Do you have a reduced test case of the Fortran code we can look at? See comment 13/14, the same common array is declared with different sizes in various places. > I

[Bug tree-optimization/69368] [6 Regression] spec2006 test case 416.gamess fails with the g++ 6.0 compiler starting with r232508

2016-02-01 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69368 --- Comment #9 from Wilco --- The loops get optimized away in dom2. The info this phase emits is hard to figure out, so it's not obvious why it thinks the array assignments are redundant (the array is used all over the place so clearly cannot be

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-10 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048 --- Comment #15 from Wilco --- (In reply to Richard Biener from comment #14) > The regression in the original description looks severe enough to warrant > some fixing even if regressing some other cases. Agreed, I think the improvement from

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-11 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048 --- Comment #17 from Wilco --- (In reply to Jiong Wang from comment #16) > * for the second patch at #c10, if we always do the following no matter > op0 is virtual & eliminable or not > > "op1 = force_operand (op1, NULL_RTX);" >

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-19 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048 --- Comment #20 from Wilco --- (In reply to Richard Henderson from comment #19) > I wish that message had been a bit more complete with the description > of the performance issue. I must guess from this... > > > ldr dst1, [reg_base1,

[Bug middle-end/70140] New: Inefficient expansion of __builtin_mempcpy

2016-03-08 Thread wdijkstr at arm dot com
-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- The expansion of __builtin_mempcpy is inefficient on many targets (eg. AArch64, ARM, PPC). The issue is due to not using the same expansion options that memcpy uses

[Bug testsuite/70055] gcc.target/i386/chkp-stropt-16.c is incompatible with glibc 2.23

2016-03-04 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70055 --- Comment #9 from Wilco --- (In reply to H.J. Lu from comment #8) > Inlining mempcpy uses a callee-saved register: > ... > > Not inlining mempcpy is preferred. If codesize is the only thing that matters... The cost is not at the caller side

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048 --- Comment #5 from Wilco --- (In reply to amker from comment #4) > (In reply to ktkachov from comment #3) > > Started with r233136. > > That's why I forced base+offset out of memory reference and kept register > scaling in in the first place.

[Bug testsuite/70055] gcc.target/i386/chkp-stropt-16.c is incompatible with glibc 2.23

2016-03-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70055 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug testsuite/70055] gcc.target/i386/chkp-stropt-16.c is incompatible with glibc 2.23

2016-03-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70055 --- Comment #5 from Wilco --- (In reply to Jakub Jelinek from comment #3) > If some arch in glibc implements memcpy.S and does not implement mempcpy.S, > then obviously the right fix is to add mempcpy.S for that arch, usually it > is just a

[Bug testsuite/70055] gcc.target/i386/chkp-stropt-16.c is incompatible with glibc 2.23

2016-03-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70055 --- Comment #6 from Wilco --- (In reply to Jakub Jelinek from comment #4) > Note the choice of this in a header file is obviously wrong, if you at some > point fix this up, then apps will still call memcpy rather than mempcpy, > even when the

[Bug target/70048] [6 Regression][AArch64] Inefficient local array addressing

2016-03-07 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048 --- Comment #12 from Wilco --- (In reply to Jiong Wang from comment #11) > (In reply to Richard Henderson from comment #10) > > Created attachment 37890 [details] > > second patch > > > > Still going through full testing, but I wanted to post

[Bug target/70048] New: [AArch64] Inefficient local array addressing

2016-03-02 Thread wdijkstr at arm dot com
: target Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- The following example generates very inefficient code on AArch64: int f1(int i) { int p[1000]; p[i] = 1; return p[i + 10] + p[i + 20]; } f1: sub sp, sp, #4000

[Bug target/70048] [AArch64] Inefficient local array addressing

2016-03-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70048 --- Comment #1 from Wilco --- The regression seem to have appeared on trunk around Feb 3-9.

[Bug middle-end/70801] New: IRA caller-saves does not support rematerialization

2016-04-26 Thread wdijkstr at arm dot com
: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- GCC emits the same code for caller-saves in all cases, even if the caller-save is an immediate which can be trivially rematerialized. The caller-save code should

[Bug middle-end/70802] New: IRA memory cost calculation incorrect for immediates

2016-04-26 Thread wdijkstr at arm dot com
Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- The following code in ira-costs.c tries to improve the memory cost for rematerializeable loads. There are several issues with this though: 1. The memory cost can

[Bug middle-end/70861] Improve code generation of switch tables

2016-04-28 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70861 --- Comment #3 from Wilco --- (In reply to Andrew Pinski from comment #2) > Note I think if we had gotos instead of assignment here we should do the > similar thing for the switch table itself. Absolutely, that was my point. > Note also the

[Bug middle-end/70861] New: Improve code generation of switch tables

2016-04-28 Thread wdijkstr at arm dot com
-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- GCC uses a very basic check to determine whether to use a switch table. A simple example from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11823 still generates a huge table

[Bug rtl-optimization/70946] New: Bad interaction between IVOpt and loop unrolling

2016-05-04 Thread wdijkstr at arm dot com
: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- IVOpt chooses between using indexing for induction variables or incrementing pointers. Due to way loop unrolling works, a decision that is optimal if unrolling

[Bug rtl-optimization/70946] Bad interaction between IVOpt and loop unrolling

2016-05-04 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70946 --- Comment #1 from Wilco --- PR36712 seems related to this

[Bug rtl-optimization/70961] Regrename ignores preferred_rename_class

2016-05-06 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70961 --- Comment #5 from Wilco --- As for a simple example, Proc_4 in Dhrystone is a good one. With -O2 and -fno-rename-registers I get the following on Thumb-2: 00c8 : c8: b430push{r4, r5} ca: f240 0300 movwr3,

[Bug rtl-optimization/70961] New: Regrename ignores preferred_rename_class

2016-05-05 Thread wdijkstr at arm dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- When deciding which register to use regrename.c calls the target function preferred_rename_class. However in pass 2 in find_rename_reg it then just ignores this preference

[Bug rtl-optimization/70961] Regrename ignores preferred_rename_class

2016-05-06 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70961 --- Comment #3 from Wilco --- (In reply to Eric Botcazou from comment #2) > Pass #2 ignores it since the preference simply couldn't be honored. In which case it should not rename that chain rather than just ignore the preference (and a

[Bug rtl-optimization/71022] GCC prefers register moves over move immediate

2016-05-10 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71022 --- Comment #2 from Wilco --- (In reply to Richard Biener from comment #1) > IRA might choose to do this as part of life-range splitting/shortening. Note > that reg-reg moves may be cheaper code-size wise (like on CISC archs with > non-fixed

[Bug rtl-optimization/71022] New: GCC prefers register moves over move immediate

2016-05-09 Thread wdijkstr at arm dot com
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- When assigning the same immediate value to different registers, GCC will always CSE the immediate and emit a register move for subsequent uses. This creates

[Bug tree-optimization/71026] New: Missing division optimizations

2016-05-09 Thread wdijkstr at arm dot com
Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- With -Ofast GCC doesn't reassociate constant multiplies or negates away from divisors to allow for more reciprocal division optimizations. It is also possible to avoid divisions

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-16 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484 --- Comment #29 from Wilco --- (In reply to Jan Hubicka from comment #28) > > On SPEC2000 the latest changes look good, compared to the old predictor gap > > improved by 10% and INT/FP by 0.8%/0.6%. I'll run SPEC2006 tonight. > > It is rather

[Bug middle-end/77484] [6/7 Regression] Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2017-01-16 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77484 --- Comment #31 from Wilco --- (In reply to Jan Hubicka from comment #30) > > > > When I looked at gap at the time, the main change was the reordering of a > > few > > if statements in several hot functions. Incorrect block frequencies also >

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-08-23 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #10 from

[Bug rtl-optimization/69847] Spec 2006 403.gcc slows down with -mlra vs. reload on PowerPC

2016-08-23 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69847 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #27 from

[Bug tree-optimization/66946] Spurious uninitialized warning

2016-09-05 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66946 Wilco changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|---

[Bug middle-end/77484] New: Static branch predictor causes ~6-8% regression of SPEC2000 GAP

2016-09-05 Thread wdijkstr at arm dot com
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- Changes in the static branch predictor (around August last year) caused regressions on SPEC2000. The PRED_CALL predictor causes GAP

[Bug target/77455] New: [AArch64] eh_return implementation fails

2016-09-02 Thread wdijkstr at arm dot com
Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- The __builtin_eh_return implementation on AArch64 generates incorrect code for many cases due to using an incorrect offset/pointer when writing the new return address to the stack

[Bug tree-optimization/65068] Improve rewriting for address type induction variables in IVOPT

2016-09-08 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65068 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #3 from

[Bug target/77455] [AArch64] eh_return implementation fails

2016-09-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77455 Wilco changed: What|Removed |Added Target||AArch64 Known to fail|

[Bug middle-end/77568] [7 regression] CSE/PRE/Hoisting blocks common instruction contractions

2016-09-12 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77568 --- Comment #5 from Wilco --- (In reply to Andrew Pinski from comment #2) > Note there are two different issues here. Well they are 3 examples of the same underlying issue - don't do a CSE when it's not profitable. How they are resolved might

[Bug middle-end/77568] New: [7 regression] CSE/PRE/Hoisting blocks common instruction contractions

2016-09-12 Thread wdijkstr at arm dot com
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- The recently introduced code hoisting aggressively moves common subexpressions that might otherwise be mergeable with other

[Bug middle-end/77568] [7 regression] CSE/PRE/Hoisting blocks common instruction contractions

2016-09-12 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77568 --- Comment #3 from Wilco --- (In reply to Andrew Pinski from comment #1) > I think this is just a pass ordering issue. We create fmas after PRE. > Maybe we should do it both before and after ... > Or enhance the pass which produces FMA to

[Bug middle-end/77580] New: Improve devirtualization

2016-09-13 Thread wdijkstr at arm dot com
Assignee: unassigned at gcc dot gnu.org Reporter: wdijkstr at arm dot com Target Milestone: --- A commonly used benchmark contains a hot loop which calls one of 2 virtual functions via a static variable which is set just before. A reduced example is: int f1(int x) { return x + 1; } int f2

[Bug tree-optimization/71026] Missing division optimizations

2016-08-24 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026 --- Comment #3 from Wilco --- (In reply to ktkachov from comment #2) > The transforms > > int f4(float x) { return (1.0f / x) < 0.0f; } // -> x < 0.0f > int f5(float x) { return (x / 2.0f) <= 0.0f; }// -> x <= 0.0f > > can be

[Bug tree-optimization/32650] Convert p+strlen(p) to strchr(p, '\0') if profitable

2016-09-28 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32650 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0

2016-10-20 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #8 from Wilco --- (In reply to Bernd Edlinger from comment #7) > (In reply to Richard Earnshaw from comment #6) > > (In reply to Bernd Edlinger from comment #5) > > > (In reply to Wilco from comment #4) > > > > However dealing with

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-10-25 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308 --- Comment #14 from Wilco --- (In reply to Bernd Edlinger from comment #13) > I am still trying to understand why thumb1 seems to outperform thumb2. > > Obviously thumb1 does not have the shiftdi3 pattern, > but even if I remove these from

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-10-20 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308 --- Comment #12 from Wilco --- It looks like we need a different approach, I've seen the extra SETs use up more registers in some cases, and in other cases being optimized away early on... Doing shift expansion at the same time as all other DI

[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0

2016-10-19 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0

2016-10-19 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #4 from Wilco --- (In reply to Bernd Edlinger from comment #3) > (In reply to Wilco from comment #2) > > (In reply to Bernd Edlinger from comment #1) > > > some background about this bug can be found here: > > > > > >

[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0

2016-10-21 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041 --- Comment #11 from Wilco --- (In reply to ktkachov from comment #10) > Confirmed then. Wilco, if you're working on this can you please assign it to > yourself? Unfortunately the form doesn't allow me to do anything with the headers...

[Bug target/77308] surprisingly large stack usage for sha512 on arm

2016-10-31 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308 --- Comment #32 from Wilco --- (In reply to Bernd Edlinger from comment #31) > Sure, combine cant help, especially because it runs before split1. > > But I wondered why this peephole2 is not enabled: > > (define_peephole2 ; ldrd > [(set

[Bug target/71951] libgcc_s built with -fomit-frame-pointer on aarch64 is broken

2017-04-13 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71951 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #8 from

[Bug target/71951] libgcc_s built with -fomit-frame-pointer on aarch64 is broken

2017-07-27 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71951 --- Comment #11 from Wilco --- (In reply to Icenowy Zheng from comment #10) > In my environment (glibc 2.25, and both the building scripts of glibc and > gcc have -fomit-frame-pointer automatically enabled), this bug is not fully > resolved yet.

[Bug target/82439] Missing (x | y) == x simplifications

2017-10-05 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82439 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #2 from

[Bug middle-end/78809] Inline strcmp with small constant strings

2017-10-13 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #4 from

[Bug middle-end/78468] [8 regression] libgomp.c/reduction-10.c and many more FAIL

2017-09-06 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78468 Wilco changed: What|Removed |Added CC||wdijkstr at arm dot com --- Comment #38 from

  1   2   >