[Bug middle-end/90693] Missing popcount simplifications

2019-06-07 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90693 --- Comment #2 from Wilco --- (In reply to Dávid Bolvanský from comment #1) > >> __builtin_popcount (x) == 1 into x == (x & -x) > > > This will not work for x = 0. > > Should work: > x && x == (x & -x) > x && (x & x-1) == 0 Good point,

[Bug middle-end/64242] Longjmp expansion incorrect

2019-06-03 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64242 --- Comment #24 from Wilco --- Author: wilco Date: Mon Jun 3 13:55:15 2019 New Revision: 271870 URL: https://gcc.gnu.org/viewcvs?rev=271870=gcc=rev Log: Fix PR64242 - Longjmp expansion incorrect Improve the fix for PR64242. Various

[Bug driver/90684] New alignment options incorrectly report error

2019-06-03 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90684 --- Comment #3 from Wilco --- Author: wilco Date: Mon Jun 3 11:27:50 2019 New Revision: 271864 URL: https://gcc.gnu.org/viewcvs?rev=271864=gcc=rev Log: Fix alignment option parser (PR90684) Fix the alignment option parser to always allow up

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2019-06-03 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853 --- Comment #36 from Wilco --- (In reply to Orr Shalom Dvory from comment #35) > Hi, thanks for your respond. can someone mark this bug as need to be > improved? > Does anyone agree/disagree with my new proposed method? It's best to create a

[Bug middle-end/90693] New: Missing popcount simplifications

2019-05-31 Thread wilco at gcc dot gnu.org
Assignee: unassigned at gcc dot gnu.org Reporter: wilco at gcc dot gnu.org Target Milestone: --- While GCC optimizes __builtin_popcount (x) != 0 into x != 0, we can also optimize __builtin_popcount (x) == 1 into x == (x & -x), and __builtin_popcount (x) > 1 into (x & (x-1)) != 0.

[Bug driver/90684] New alignment options incorrectly report error

2019-05-30 Thread wilco at gcc dot gnu.org
gnu.org |wilco at gcc dot gnu.org --- Comment #1 from Wilco --- Proposed patch: https://gcc.gnu.org/ml/gcc-patches/2019-05/msg02030.html

[Bug middle-end/90684] New: New alignment options incorrectly report error

2019-05-30 Thread wilco at gcc dot gnu.org
: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wilco at gcc dot gnu.org Target Milestone: --- GCC9 always reports an error when using -falign-functions=16:8:8 cc1: error: invalid number of arguments for ‘-falign-functions’ option: ‘16:8:8’ This is not working

[Bug middle-end/12849] testing divisibility by constant

2019-05-30 Thread wilco at gcc dot gnu.org
||wilco at gcc dot gnu.org Resolution|--- |FIXED --- Comment #6 from Wilco --- Fixed in GCC9.

[Bug target/90317] [7/8/9/10] ICE for arm sha1h and wrong optimisations on sha1h/c/m/p

2019-05-29 Thread wilco at gcc dot gnu.org
||2019-05-29 CC||wilco at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Wilco --- Confirmed

[Bug tree-optimization/90594] New: [9/10 regression] Spurious popcount emitted

2019-05-23 Thread wilco at gcc dot gnu.org
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wilco at gcc dot gnu.org Target Milestone: --- The following testcase emits a popcount which computes the final pointer value. This is redundant given the loop already computes the pointer value. The popcount causes

[Bug target/41999] Bug in generation of interrupt function code for ARM processor

2019-05-17 Thread wilco at gcc dot gnu.org
||wilco at gcc dot gnu.org Resolution|--- |WORKSFORME --- Comment #4 from Wilco --- Works since at least GCC4.5.4.

[Bug other/16996] [meta-bug] code size improvements

2019-05-17 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16996 Bug 16996 depends on bug 38570, which changed state. Bug 38570 Summary: [arm] -mthumb generates sub-optimal prolog/epilog https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38570 What|Removed |Added

[Bug target/38570] [arm] -mthumb generates sub-optimal prolog/epilog

2019-05-17 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38570 Wilco changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/38570] [arm] -mthumb generates sub-optimal prolog/epilog

2019-05-17 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38570 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #12 from

[Bug target/9831] [ARM] Peephole for multiple load/store could be more effective.

2019-05-17 Thread wilco at gcc dot gnu.org
||wilco at gcc dot gnu.org Resolution|--- |WONTFIX --- Comment #8 from Wilco --- There doesn't appear to be anything that can be improved here. Literal pool loads can't be easily peepholed into LDM, and there aren't many opportunities anyway.

[Bug target/42017] gcc compiling C for ARM has stopped using r14 in leaf functions?

2019-05-17 Thread wilco at gcc dot gnu.org
||wilco at gcc dot gnu.org Resolution|--- |WORKSFORME --- Comment #6 from Wilco --- This has been fixed since at least GCC5.4: https://www.godbolt.org/z/6IAGfh

[Bug middle-end/90263] Calls to mempcpy should use memcpy

2019-05-07 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90263 --- Comment #20 from Wilco --- (In reply to Martin Liška from comment #19) > Created attachment 46265 [details] > Patch candidate v2 > > Update patch that should be fine. Tests on x86_64 work except: > FAIL:

[Bug middle-end/90263] Calls to mempcpy should use memcpy

2019-04-29 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90263 --- Comment #18 from Wilco --- (In reply to Martin Liška from comment #14) > Created attachment 46262 [details] > Patch candidate > > Patch candidate that handles: > > $ cat ~/Programming/testcases/mempcpy.c > int *mempcopy2 (int *p, int *q,

[Bug middle-end/90263] Calls to mempcpy should use memcpy

2019-04-29 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90263 --- Comment #17 from Wilco --- (In reply to Wilco from comment #16) > (In reply to Martin Sebor from comment #15) > > I just noticed I have been misreading mempcpy as memccpy and so making no > > sense. Sorry about that! Please ignore my

[Bug middle-end/90263] Calls to mempcpy should use memcpy

2019-04-29 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90263 --- Comment #16 from Wilco --- (In reply to Martin Sebor from comment #15) > I just noticed I have been misreading mempcpy as memccpy and so making no > sense. Sorry about that! Please ignore my comments. I see, yes we have too many and the

[Bug rtl-optimization/90249] [9/10 Regression] Code size regression on thumb2 due to sub-optimal register allocation starting with r265398

2019-04-29 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90249 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #5 from

[Bug middle-end/90263] Calls to mempcpy should use memcpy

2019-04-29 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90263 --- Comment #12 from Wilco --- (In reply to Martin Sebor from comment #11) > My concern is that transforming memccpy to memcpy would leave little > incentive for libraries like glibc to provide a more optimal implementation. > Would implementing

[Bug middle-end/90263] Calls to mempcpy should use memcpy

2019-04-29 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90263 --- Comment #9 from Wilco --- (In reply to Martin Sebor from comment #7) > Rather than unconditionally transforming mempcpy to memcpy I would prefer to > see libc implementations of memccpy optimized. WG14 N2349 discusses a > rationale for

[Bug middle-end/90263] Calls to mempcpy should use memcpy

2019-04-29 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90263 --- Comment #6 from Wilco --- (In reply to Martin Liška from comment #5) > The discussion looks familiar to me. Isn't that PR70140, where I was > suggesting something like: > > https://marc.info/?l=gcc-patches=150166433909242=2 > > with a new

[Bug middle-end/90263] Calls to mempcpy should use memcpy

2019-04-26 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90263 --- Comment #4 from Wilco --- (In reply to Jakub Jelinek from comment #3) > Because then you penalize properly maintained targets which do have > efficient mempcpy. And even if some targets don't have efficient mempcpy > right now, that doesn't

[Bug middle-end/90263] Calls to mempcpy should use memcpy

2019-04-26 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90263 --- Comment #2 from Wilco --- (In reply to Jakub Jelinek from comment #1) > As stated several times in the past, I strongly disagree. Why? GCC already does this for bzero and bcopy.

[Bug middle-end/90263] New: Calls to mempcpy should use memcpy

2019-04-26 Thread wilco at gcc dot gnu.org
Assignee: unassigned at gcc dot gnu.org Reporter: wilco at gcc dot gnu.org Target Milestone: --- While GCC now inlines fixed-size mempcpy like memcpy, GCC still emits calls to mempcpy rather than converting to memcpy. Since most libraries, including GLIBC, do not have optimized

[Bug middle-end/90262] New: Inline small constant memmoves

2019-04-26 Thread wilco at gcc dot gnu.org
Assignee: unassigned at gcc dot gnu.org Reporter: wilco at gcc dot gnu.org Target Milestone: --- GCC does not inline fixed-size memmoves. However memmove can be as easily inlined as memcpy. The existing memcpy infrastructure could be reused/expanded for this - all loads would

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #52 from Wilco --- (In reply to Segher Boessenkool from comment #48) > With just Peter's and Jakub's patch, it *improves* code size by 0.090%. > That does not fix this PR though :-/ But it does fix most of the codesize regression.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #47 from Wilco --- (In reply to Segher Boessenkool from comment #46) > With all three patches together (Peter's, mine, Jakub's), I get a code size > increase of only 0.047%, much more acceptable. Now looking what that diff > really

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #38 from Wilco --- (In reply to Segher Boessenkool from comment #37) > Yes, it is a balancing act. Which option works better? Well the question really is what is bad about movsi_compare0 that could be easily fixed? The move is for

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-04-17 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763 --- Comment #54 from Wilco --- (In reply to Jeffrey A. Law from comment #53) > Realistically the register allocation issues are not going to get addressed > this cycle nor are improvements to the overall handling of RMW insns in > combine. So

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-04-14 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763 --- Comment #52 from Wilco --- (In reply to Jeffrey A. Law from comment #49) > I think the insv_1 (and it's closely related insv_2) regressions can be > fixed by a single ior/and pattern in the backend or by hacking up combine a > bit. I'm

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-12 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #21 from Wilco --- (In reply to Vladimir Makarov from comment #20) > (In reply to Wilco from comment #19) > > (In reply to Peter Bergner from comment #18) > > > (In reply to Segher Boessenkool from comment #15) > > > > Popping

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-12 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #19 from Wilco --- (In reply to Peter Bergner from comment #18) > (In reply to Segher Boessenkool from comment #15) > > Popping a5(r116,l0) -- assign reg 3 > > Popping a3(r112,l0) -- assign reg 4 > > Popping

[Bug target/81800] [8/9 regression] on aarch64 ilp32 lrint should not be inlined as two instructions

2019-04-11 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81800 --- Comment #16 from Wilco --- (In reply to Jakub Jelinek from comment #15) > (In reply to Wilco from comment #14) > > (In reply to Jakub Jelinek from comment #13) > > > Patches should be pinged after a week if they aren't reviewed, > > >

[Bug target/81800] [8/9 regression] on aarch64 ilp32 lrint should not be inlined as two instructions

2019-04-11 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81800 --- Comment #14 from Wilco --- (In reply to Jakub Jelinek from comment #13) > Patches should be pinged after a week if they aren't reviewed, furthermore, > it is better to CC explicitly relevant maintainers. I've got about 10 patches waiting,

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834 --- Comment #16 from Wilco --- (In reply to kugan from comment #15) > (In reply to Wilco from comment #11) > > There is also something odd with the way the loop iterates, this doesn't > > look right: > > > > whilelo p0.s, x3, x4 > >

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-03-28 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #11 from

[Bug target/89607] Missing optimization for store of multiple registers on aarch64

2019-03-19 Thread wilco at gcc dot gnu.org
||wilco at gcc dot gnu.org Resolution|--- |FIXED Target Milestone|--- |9.0 --- Comment #9 from Wilco --- Fixed in GCC9 already, so closing.

[Bug ada/89493] [9 Regression] Stack smashing on armv7hl

2019-03-19 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89493 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #2 from

[Bug target/89752] [8/9 Regression] ICE in emit_move_insn, at expr.c:3723

2019-03-18 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89752 --- Comment #10 from Wilco --- It seems that rewriting "+rm" into "=rm" and "0" is not equivalent. Eg. __asm__ ("" : [a0] "=m" (A0) : "0" (A0)); gives a million warnings "matching constraint does not allow a register", so "0" appears to imply

[Bug target/89752] [8/9 Regression] ICE in emit_move_insn, at expr.c:3723

2019-03-18 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89752 --- Comment #4 from Wilco --- Small example which generates the same ICE on every GCC version: typedef struct { int x, y, z; } X; void f(void) { X A0, A1; __asm__ ("" : [a0] "+rm" (A0),[a1] "+rm" (A1)); } So it's completely invalid inline

[Bug target/89752] [8/9 Regression] ICE in emit_move_insn, at expr.c:3723

2019-03-18 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89752 --- Comment #3 from Wilco --- Full instruction: (insn 531 530 532 19 (parallel [ (set (mem/c:BLK (reg:DI 3842) [29 A0+0 S2 A64]) (asm_operands:BLK ("") ("=rm") 0 [ (mem/c:BLK (reg:DI 3846) [29

[Bug target/89752] [8/9 Regression] ICE in emit_move_insn, at expr.c:3723

2019-03-18 Thread wilco at gcc dot gnu.org
||2019-03-18 CC||wilco at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Wilco --- Confirmed. It ICEs in Eigen::internal::gebp_kernel, 2, 4, false, false>::operator() It seems to choke on this asm dur

[Bug target/89222] [7/8 regression] ARM thumb-2 misoptimisation of func ptr call with -O2 or -Os

2019-03-05 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89222 Wilco changed: What|Removed |Added Target Milestone|--- |8.5 Summary|[7/8/9 regression] ARM

[Bug target/89222] [7/8/9 regression] ARM thumb-2 misoptimisation of func ptr call with -O2 or -Os

2019-03-05 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89222 --- Comment #9 from Wilco --- Author: wilco Date: Tue Mar 5 15:04:01 2019 New Revision: 269390 URL: https://gcc.gnu.org/viewcvs?rev=269390=gcc=rev Log: [ARM] Fix PR89222 The GCC optimizer can generate symbols with non-zero offset from simple

[Bug tree-optimization/89437] [9 regression] incorrect result for sinl (atanl (x))

2019-03-04 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89437 Wilco changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug tree-optimization/89437] [9 regression] incorrect result for sinl (atanl (x))

2019-03-04 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89437 --- Comment #1 from Wilco --- Author: wilco Date: Mon Mar 4 12:36:04 2019 New Revision: 269364 URL: https://gcc.gnu.org/viewcvs?rev=269364=gcc=rev Log: Fix PR89437 Fix PR89437. Fix the sinatan-1.c testcase to not run without a C99 target

[Bug tree-optimization/86829] Missing sin(atan(x)) and cos(atan(x)) optimizations

2019-02-21 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86829 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #8 from

[Bug tree-optimization/89437] New: [9 regression] incorrect result for sinl (atanl (x))

2019-02-21 Thread wilco at gcc dot gnu.org
Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wilco at gcc dot gnu.org Target Milestone: --- A recently added optimization uses an inline expansion for sinl (atanl (x)). As it involves computing sqrtl (x * x + 1) which can overflow for large x

[Bug libfortran/78314] [aarch64] ieee_support_halting does not report unsupported fpu traps correctly

2019-02-20 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314 --- Comment #25 from Wilco --- (In reply to Steve Ellcey from comment #24) > See email strings at: > > https://gcc.gnu.org/ml/fortran/2019-01/msg00276.html > https://gcc.gnu.org/ml/fortran/2019-02/msg00057.html > > For more discussion. Sure,

[Bug libfortran/78314] [aarch64] ieee_support_halting does not report unsupported fpu traps correctly

2019-02-19 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #23 from

[Bug middle-end/89037] checking ice emitting 128-bit bit-field initializer

2019-02-19 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89037 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org Version|9.0

[Bug target/85711] [8 regression] ICE in aarch64_classify_address, at config/aarch64/aarch64.c:5678

2019-02-15 Thread wilco at gcc dot gnu.org
||2019-02-15 CC||wilco at gcc dot gnu.org Summary|ICE in |[8 regression] ICE in |aarch64_classify_address, |aarch64_classify_address

[Bug target/89190] [8/9 regression][ARM] armv8-m.base invalid ldm ICE

2019-02-13 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89190 --- Comment #2 from Wilco --- Author: wilco Date: Wed Feb 13 16:22:25 2019 New Revision: 268848 URL: https://gcc.gnu.org/viewcvs?rev=268848=gcc=rev Log: [ARM] Fix Thumb-1 ldm (PR89190) This patch fixes an ICE in the Thumb-1 LDM peepholer.

[Bug target/89222] [7/8/9 regression] ARM thumb-2 misoptimisation of func ptr call with -O2 or -Os

2019-02-13 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89222 --- Comment #8 from Wilco --- (In reply to Wilco from comment #7) > Patch: https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00780.html Updated patch: https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00947.html

[Bug tree-optimization/86637] [9 Regression] ICE: tree check: expected block, have in inlining_chain_to_json, at optinfo-emit-json.cc:293

2019-02-11 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86637 --- Comment #14 from Wilco --- Author: wilco Date: Mon Feb 11 18:14:37 2019 New Revision: 268777 URL: https://gcc.gnu.org/viewcvs?rev=268777=gcc=rev Log: [COMMITTED] Fix pthread errors in pr86637-2.c Fix test errors on targets which do not

[Bug target/89222] [7/8/9 regression] ARM thumb-2 misoptimisation of func ptr call with -O2 or -Os

2019-02-11 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89222 --- Comment #7 from Wilco --- Patch: https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00780.html

[Bug target/89222] [7.x regression] ARM thumb-2 misoptimisation of func ptr call with -O2 or -Os

2019-02-08 Thread wilco at gcc dot gnu.org
||2019-02-08 Assignee|unassigned at gcc dot gnu.org |wilco at gcc dot gnu.org Ever confirmed|0 |1

[Bug target/89222] [7.x regression] ARM thumb-2 misoptimisation of func ptr call with -O2 or -Os

2019-02-08 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89222 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #5 from

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-02-06 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #8 from

[Bug rtl-optimization/89195] [7/8/9 regression] Corrupted stack offset after combine

2019-02-05 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89195 --- Comment #11 from Wilco --- (In reply to Segher Boessenkool from comment #9) > That patch is pre-approved if it regchecks fine (on more than just x86). > Thanks! check-gcc is clean on aarch64_be-none-elf

[Bug rtl-optimization/89195] [7/8/9 regression] Corrupted stack offset after combine

2019-02-05 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89195 --- Comment #10 from Wilco --- (In reply to Jakub Jelinek from comment #8) > Created attachment 45606 [details] > gcc9-pr89195.patch > > Now in patch form (untested so far). That works fine indeed. It avoids accessing the object out of bounds

[Bug rtl-optimization/89195] [7/8/9 regression] Corrupted stack offset after combine

2019-02-05 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89195 --- Comment #4 from Wilco --- (In reply to Segher Boessenkool from comment #3) > (In reply to Wilco from comment #1) > > len is unsigned HOST_WIDE_INT, so bits_to_bytes_round_down does an unsigned > > division... > > That shouldn't make a

[Bug rtl-optimization/89195] [7/8/9 regression] Corrupted stack offset after combine

2019-02-04 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89195 --- Comment #1 from Wilco --- make_extraction does: if (MEM_P (inner)) { poly_int64 offset; /* POS counts from lsb, but make OFFSET count in memory order. */ if (BYTES_BIG_ENDIAN) offset

[Bug rtl-optimization/89195] [7/8/9 regression] Corrupted stack offset after combine

2019-02-04 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89195 Wilco changed: What|Removed |Added Target||aarch64 Target Milestone|---

[Bug rtl-optimization/89195] New: [7/8/9 regression] Corrupted stack offset after combine

2019-02-04 Thread wilco at gcc dot gnu.org
Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wilco at gcc dot gnu.org Target Milestone: --- The following testcase generates incorrect stack offsets on AArch64 since GCC7 when compiled with -O1 -mbig-endian: struct S

[Bug target/89190] [8/9 regression][ARM] armv8-m.base invalid ldm ICE

2019-02-04 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89190 Wilco changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed|

[Bug target/89190] New: [8/9 regression][ARM] armv8-m.base invalid ldm ICE

2019-02-04 Thread wilco at gcc dot gnu.org
: target Assignee: unassigned at gcc dot gnu.org Reporter: wilco at gcc dot gnu.org Target Milestone: --- The following testcases ICEs with -march=armv8-m.base on arm.none.eabi: long long a; int b, c; int d(int e, int f) { return e << f; } void g() { long long h;

[Bug ipa/89104] ICE: Segmentation fault (in tree_int_cst_elt_check)

2019-01-31 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89104 --- Comment #5 from Wilco --- (In reply to Jakub Jelinek from comment #4) > I really don't like these aarch64 warnings, declare simd is an optimization > (admittedly with ABI consequences) and warning about this by default is > weird, > + it is

[Bug ipa/89104] ICE: Segmentation fault (in tree_int_cst_elt_check)

2019-01-30 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89104 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #3 from

[Bug target/89101] [Aarch64] vfmaq_laneq_f32 generates unnecessary dup instrcutions

2019-01-29 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101 Wilco changed: What|Removed |Added Status|WAITING |NEW Known to work|

[Bug target/89101] [Aarch64] vfmaq_laneq_f32 generates unnecessary dup instrcutions

2019-01-29 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #1 from

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-25 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #23 from Wilco --- (In reply to ktkachov from comment #22) > helps even more. On Cortex-A72 it gives a bit more than 6% (vs 3%) > improvement on parest, and about 5.3% on a more aggressive CPU. > I tried unrolling 8x in a similar

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-01-25 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763 --- Comment #32 from Wilco --- Author: wilco Date: Fri Jan 25 13:29:06 2019 New Revision: 268265 URL: https://gcc.gnu.org/viewcvs?rev=268265=gcc=rev Log: [PATCH][AArch64] Fix generation of tst (PR87763) The TST instruction no longer matches in

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-24 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #21 from Wilco --- (In reply to rguent...@suse.de from comment #20) > On Thu, 24 Jan 2019, wilco at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 > > > > --

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-24 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #19 from Wilco --- (In reply to rguent...@suse.de from comment #18) > > 1) Unrolling for load-pair-forming vectorisation (Richard Sandiford's > > suggestion) > > If that helps, sure (I'd have guessed uarchs are going to split >

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-01-22 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763 --- Comment #23 from Wilco --- Author: wilco Date: Tue Jan 22 17:49:46 2019 New Revision: 268159 URL: https://gcc.gnu.org/viewcvs?rev=268159=gcc=rev Log: Fix vect-nop-move.c test Fix a failing test - changes in Combine mean the test now fails

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-01-22 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763 --- Comment #22 from Wilco --- (In reply to Steve Ellcey from comment #21) > If I look at this specific example: > > int f2 (int x, int y) > { > return (x & ~0x0ff000) | ((y & 0x0ff) << 12); > } > > Is this because of x0 (a hard register)

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-01-21 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763 --- Comment #19 from Wilco --- (In reply to Segher Boessenkool from comment #18) > https://gcc.gnu.org/ml/gcc/2019-01/msg00112.html Thanks, I hadn't noticed that yet... I need to look at it in more detail, but are you saying that combine no

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-01-21 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763 --- Comment #17 from Wilco --- (In reply to Vladimir Makarov from comment #14) > I've checked cvtf_1.c generated code and I don't see additional fmov > anymore. I guess it was fixed by an ira-costs.c change (a special > consideration of moves

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-01-21 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763 --- Comment #13 from Wilco --- (In reply to Segher Boessenkool from comment #12) > Before the change combine forwarded all argument (etc.) hard registers > wherever > it could, doing part of RA's job (and doing a lousy job of it). If after the

[Bug rtl-optimization/87763] [9 Regression] aarch64 target testcases fail after r265398

2019-01-21 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87763 --- Comment #11 from Wilco --- A SPEC2006 run shows the codesize cost of make_more_copies is 0.05%. Practically all tests are worse, the largest increases are perlbench at 0.20%, gromacs 0.12%, calculix 0.12%, soplex 0.08%, xalancbmk 0.07%, wrf

[Bug middle-end/88560] [9 Regression] armv8_2-fp16-move-1.c and related regressions after r266385

2019-01-18 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88560 --- Comment #6 from Wilco --- (In reply to Vladimir Makarov from comment #5) > We have too many tests checking expected generated code. We should more > focus on overall effect of the change. SPEC would be a good criterium > although it is

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-17 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #16 from Wilco --- (In reply to rguent...@suse.de from comment #15) > which is what I refered to for branch prediction. Your & prompts me > to a way to do sth similar as duffs device, turning the loop into a nest. > > head: >

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-17 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #14 from Wilco --- (In reply to rguent...@suse.de from comment #13) > Usually the peeling is done to improve branch prediction on the > prologue/epilogue. Modern branch predictors do much better on a loop than with this kind of

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #37 from Wilco --- (In reply to rsand...@gcc.gnu.org from comment #35) > Yeah, the expr.c patch makes the original testcase work, but we still fail > for: That's the folding in ccp1 after inlining, which will require a similar fix.

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #34 from Wilco --- With just the expr.c patch the gcc regression tests all pass on big-endian AArch64. Interestingly this includes the new torture test, ie. it does not trigger the union bug.

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #33 from Wilco --- (In reply to Richard Biener from comment #32) > > > > Index: gcc/expr.c > > === > > --- gcc/expr.c (revision 267553) > > +++ gcc/expr.c (working

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #7 from Wilco --- (In reply to rguent...@suse.de from comment #6) > On Wed, 9 Jan 2019, wilco at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 > > > > --- Comment #5 from

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #5 from Wilco --- (In reply to Wilco from comment #4) > (In reply to ktkachov from comment #2) > > Created attachment 45386 [details] > > aarch64-llvm output with -Ofast -mcpu=cortex-a57 > > > > I'm attaching the full LLVM aarch64

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #4 from

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #29 from Wilco --- (In reply to Richard Biener from comment #26) > Did anybody test the patch? Testing on x86_64 will be quite pointless... Well that generates _18 = BIT_FIELD_REF <_2, 16, 14>; and becomes: ubfxx1, x20, 2, 16

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #27 from Wilco --- (In reply to Eric Botcazou from comment #22) > > Is it really pure RTL, therefore not used in tree? So the above patch using > > BITS_BIG_ENDIAN for tree stuff would be incorrect to use it? > > I wouldn't say

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #25 from Wilco --- (In reply to rguent...@suse.de from comment #17) > On Tue, 8 Jan 2019, wilco at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 > > > > --- Comment #16 fro

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-08 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #21 from Wilco --- (In reply to Eric Botcazou from comment #20) > > BITS_BIG_ENDIAN is just a convenience to the target code writer. The other > > four do matter, and are quite obvious really (and all four are necessary). > > Yes,

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-08 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #19 from Wilco --- (In reply to Segher Boessenkool from comment #18) > Well, it is always possible to generate code with the opposite endianness to > what the hardware "wants". It just won't be very fast code. > > BITS_BIG_ENDIAN

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2019-01-08 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #20 from Wilco --- I see Kyrill added some examples that show LLVM knows how to unroll loops: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 This kind of thing is much worse than a trailing loop, both for branch prediction and

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2019-01-08 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 --- Comment #19 from Wilco --- (In reply to Jakub Jelinek from comment #18) > The duffs device doesn't need to be done with computed jump, it can be done > with 3 conditional branches + 3 comparisons too. The advantage of doing > that is

<    1   2   3   4   5   6   7   8   >