[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||jakub at gcc dot gnu.org Resolution|--- |FIXED --- Comment #20 from Jakub Jelinek --- Assuming fixed then, please reopen if not.
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #19 from Bernd Edlinger --- Hope all is now working again.
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #18 from Bernd Edlinger --- Author: edlinger Date: Fri Aug 16 16:37:04 2019 New Revision: 274578 URL: https://gcc.gnu.org/viewcvs?rev=274578=gcc=rev Log: 2019-08-16 Bernd Edlinger Backport from mainline 2019-08-16 Bernd Edlinger PR tree-optimization/91109 * lra-int.h (lra_need_for_scratch_reg_p): Declare. * lra.c (lra): Use lra_need_for_scratch_reg_p. * lra-spills.c (lra_need_for_scratch_reg_p): New function. Modified: branches/gcc-9-branch/gcc/ChangeLog branches/gcc-9-branch/gcc/lra-int.h branches/gcc-9-branch/gcc/lra-spills.c branches/gcc-9-branch/gcc/lra.c
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #17 from Bernd Edlinger --- Author: edlinger Date: Fri Aug 16 16:31:13 2019 New Revision: 274577 URL: https://gcc.gnu.org/viewcvs?rev=274577=gcc=rev Log: 2019-08-16 Bernd Edlinger Backport from mainline 2019-08-07 Bernd Edlinger PR tree-optimization/91109 * lra-remat.c (update_scratch_ops): Remove assignment of the hard register. Modified: branches/gcc-9-branch/gcc/ChangeLog branches/gcc-9-branch/gcc/lra-remat.c
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #16 from Bernd Edlinger --- Author: edlinger Date: Fri Aug 16 15:34:47 2019 New Revision: 274573 URL: https://gcc.gnu.org/viewcvs?rev=274573=gcc=rev Log: 2019-08-16 Bernd Edlinger PR tree-optimization/91109 * lra-int.h (lra_need_for_scratch_reg_p): Declare. * lra.c (lra): Use lra_need_for_scratch_reg_p. * lra-spills.c (lra_need_for_scratch_reg_p): New function. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-int.h trunk/gcc/lra-spills.c trunk/gcc/lra.c
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #15 from Christophe Lyon --- Since r274532 (gcc-9-branch), I am seeing: FAIL: gcc.c-torture/execute/20040709-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test target arm-none-linux-gnueabi --with-mode arm --with-cpu cortex-a9 The same test passes on arm-none-linux-gnueabihf, or using --with-mode thumb
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #14 from Bernd Edlinger --- I can reproduce with trunk: arm-linux-gnueabihf-gcc -S -O2 -mthumb -flto -fno-use-linker-plugin 20040709-1.c but not with -O3 -g, neither with gcc-9 and my fix applied. Nevertheless it is quite obvious that the second patch is needed to handle the case when rematerialized instructions have scratches, but nothing needs to be spilled so the loop need to continue with lra_assign instead of lra_spill.
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #13 from Bernd Edlinger --- Created attachment 46704 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46704=edit another untested patch
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #12 from Christophe Lyon --- Indeed, although r274163 fixes the problem I reported, it also introduces a regression when compiling the very same testcase but adding -mthumb: FAIL: gcc.c-torture/execute/20040709-1.c -O2 (internal compiler error) FAIL: gcc.c-torture/execute/20040709-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error) FAIL: gcc.c-torture/execute/20040709-3.c -O2 (internal compiler error) FAIL: gcc.c-torture/execute/20040709-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error) FAIL: gcc.c-torture/execute/20040709-3.c -O3 -g (internal compiler error) My gcc.log says: /gcc/testsuite/gcc.c-torture/execute/20040709-1.c: In function 'retmeD': /gcc/testsuite/gcc.c-torture/execute/20040709-1.c:19:10: note: parameter passing for argument of type 'struct D' changed in GCC 9.1 /gcc/testsuite/gcc.c-torture/execute/20040709-1.c:95:64: note: in expansion of macro 'T' /gcc/testsuite/gcc.c-torture/execute/20040709-1.c: In function 'testI': /gcc/testsuite/gcc.c-torture/execute/20040709-1.c:100:75: error: insn does not satisfy its constraints: /gcc/testsuite/gcc.c-torture/execute/20040709-1.c:55:10: note: in definition of macro 'T' (insn 311 122 309 8 (parallel [ (set (reg:SI 3 r3 [266]) (truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (reg:SI 10 r10 [265])) (zero_extend:DI (reg:SI 8 r8 [267]))) (const_int 32 [0x20] (clobber (scratch:SI)) ]) "/gcc/testsuite/gcc.c-torture/execute/20040709-1.c":100:73 70 {*umulsi3_highpart_v6} (nil)) during RTL pass: reload /gcc/testsuite/gcc.c-torture/execute/20040709-1.c:100:75: internal compiler error: in extract_constrain_insn, at recog.c:2211 /gcc/testsuite/gcc.c-torture/execute/20040709-1.c:55:10: note: in definition of macro 'T' 0x5a7d5d _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) /gcc/rtl-error.c:108 0x5a7d83 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) /gcc/rtl-error.c:119 0xb7b85d extract_constrain_insn(rtx_insn*) /gcc/recog.c:2211 0xa629b7 check_rtl /gcc/lra.c:2184 0xa67def lra(_IO_FILE*) /gcc/lra.c:2622 0xa19f49 do_reload /gcc/ira.c:5522 0xa19f49 execute /gcc/ira.c:5706
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #11 from Bernd Edlinger --- No, it needs to be back-ported to gcc-9.3 (i am still reg-testing) and Vladimir Makarov wrote the following: https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00463.html > Still I think more work on the PR is needed. If subsequent LRA sub-pass > spills some pseudo to assign a hard register to the scratch of the > rematerialized insn as it was in the original insn, it might make this > rematerialization unprofitable. So I'll think how to avoid the > unprofitable rematerialization in such cases and would like to work on > this PR more. > > Please, do not close the PR after committing the patch. I am going to > work on it more when stage3 starts.
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 Martin Liška changed: What|Removed |Added CC||marxin at gcc dot gnu.org --- Comment #10 from Martin Liška --- Bernd: Can the bug be marked as resolved?
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #9 from Bernd Edlinger --- Author: edlinger Date: Wed Aug 7 13:45:06 2019 New Revision: 274163 URL: https://gcc.gnu.org/viewcvs?rev=274163=gcc=rev Log: 2019-08-07 Bernd Edlinger PR tree-optimization/91109 * lra-remat.c (update_scratch_ops): Remove assignment of the hard register. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-remat.c
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #8 from Bernd Edlinger --- Patch is posted here: https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00305.html
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #7 from Bernd Edlinger --- I can reproduce this defect with gcc-9 (!) $ ../gcc-9-branch/configure --prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64-1 --target=arm-linux-gnueabihf --enable-languages=c,c++ --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard $ TMP=. arm-linux-gnueabihf-gcc -O2 -flto -save-temps -fdump-rtl-all-all 20040709-1.c $ grep same *.reload Assigning the same 6155 to r11 $ vi *.ltrans0.s look for the last umull (it is always the last one): str r5, [fp] umull fp, r3, r7, r8 [...] str r6, [fp] But the same does not happen for gcc-8: $ grep same *.reload the assembler listing looks okay. But the update_scratch_ops looks exactly identical, Therefore the issue is likely just a hidden one there.
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #6 from Bernd Edlinger --- with this patch the relevant part if the reload dump file looks different: (insn 3414 6591 6682 129 (set (mem/c:SI (reg/f:SI 5 r5 [5715]) [1 s.5566D.5531+0 S4 A32]) (reg:SI 6 r6 [orig:828 _821 ] [828])) "20040709-1.c":13:5 654 {*arm_movsi_vfp} (nil)) [...] (insn 6826 3453 6816 129 (parallel [ (set (reg:SI 3 r3 [4187]) (truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (reg:SI 11 fp [4186])) (zero_extend:DI (reg:SI 7 r7 [4188]))) (const_int 32 [0x20] (clobber (reg:SI 1 r1 [5970])) ]) "20040709-1.c":108:291 70 {*umulsi3_highpart_v6} (nil)) [...] (insn 3509 3530 3531 132 (set (mem/c:SI (reg/f:SI 5 r5 [5715]) [1 s.5566D.5531+0 S4 A32]) (reg:SI 14 lr [orig:692 D.6083 ] [692])) "20040709-1.c":13:5 654 {*arm_movsi_vfp} (nil)) This time insn 6826 is able to choose a different register than r5, and most importantly the live-range info is correct, since the old register r5970 is renamed to r6374 temporarily: Creating newreg=6374 from oldreg=5970, assigning class GENERAL_REGS to scratch pseudo copy r6374 6816: r6364:SI=r4187:SI REG_DEAD r4187:SI Inserting rematerialization insn before: 6826: {r6364:SI=trunc(zero_extend(r4186:SI)*zero_extend(r4188:SI) 0>>0x20);clobber r6374:SI;} REG_UNUSED r6374:SI which is visible in the live ranges (which was not there before): r6007: [59..59] r6374: [1572..1572] Compressing live ranges: from 3802 to 75 - 1% Ranges after the compression: [...] r6007: [1..1] r6374: [40..40] However since the re-materialized instruction is able to use r1 there is no conflict any more. So I believe the patch is a straight improvement over the previous state of affairs. So, as it looks like, this is a potentially catastrophic bug, and not related to -flto at all or any specific target architecture. From my testing it is likely that was already there in gcc-9.0.1.
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #5 from Bernd Edlinger --- Created attachment 46654 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46654=edit untested patch It looks like update_scratch_ops creates a copy of the original scratch register, but the new scratch register has no working live range info. I don't know a correct solution for the underlying problem, but removing the assignment to reg_renumber seems to fix the test case.
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 Bernd Edlinger changed: What|Removed |Added CC||bernd.edlinger at hotmail dot de --- Comment #4 from Bernd Edlinger --- hmm, funny, I saw this test case failing since february at least: https://gcc.gnu.org/ml/gcc-testresults/2019-02/msg02686.html FAIL: gcc.c-torture/execute/20040709-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard I have not looked into it before, but to me it looks like a reload bug: str r6, [r5] <= r5 still valid stm r9, {r0, r1, r2, r3} umull r5, r3, r7, fp <= r5 clobbered ldr r2, [r4, #176] lsr r9, r3, #3 mov r3, r0 eor r3, r3, r2 rsb r9, r9, r9, lsl #4 tst r3, r10 sub r9, fp, r9 bne .L29 ldrhr2, [sp, #176] ldrhr3, [r4, #176] eor r2, r2, r3 ubfxr2, r2, #0, #12 cmp r2, #0 bne .L29 cmp r9, r9 bne .L29 mla r6, r8, r6, lr ldr fp, .L79+36 mla lr, r8, r6, lr ubfxr6, r6, #16, #11 bfi r3, r6, #0, #12 strhr3, [r4, #176] @ movhi uxthr8, r3 ldm fp, {r0, r1, r2, r3} ubfxip, lr, #16, #11 add r7, r6, ip add ip, sp, #176 bfi r8, r7, #0, #12 str lr, [r5] <= r5 invalid reload: (insn 6826 3453 6816 129 (parallel [ (set (reg:SI 3 r3 [4187]) (truncate:SI (lshiftrt:DI (mult:DI (zero_extend:DI (reg:SI 11 fp [4186])) (zero_extend:DI (reg:SI 7 r7 [4188]))) (const_int 32 [0x20] (clobber (reg:SI 5 r5 [5970])) ]) "20040709-1.c":108:291 70 {*umulsi3_highpart_v6} (nil)) [...] (insn 3509 3530 3531 132 (set (mem/c:SI (reg/f:SI 5 r5 [5715]) [1 s.5566D.5531+0 S4 A32]) (reg:SI 14 lr [orig:692 D.6083 ] [692])) "20040709-1.c":13:5 654 {*arm_movsi_vfp} (nil))
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #3 from rguenther at suse dot de --- On Mon, 8 Jul 2019, clyon at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 > > --- Comment #2 from Christophe Lyon --- > Removing the test*() calls from the end, the first failing one is testX(). > However, if I remove all the preceding ones, the test passes. Ugh. Not very much simplification. I suppose trying to trim the number of test() calls before testX() isn't possible? > Using -fwhole-program instead of -flto has no effect: the test still fails. That's good news OTOH and simplifies analysis. > Adding a printf() call in check() also makes the test pass. test##S you mean probably. But yes, that's expected. Given there's no regression with hard float having testW () might be important (uses long double). There may be also ABI differences (sizeof (long double)) when switching between hard-float and soft-float? Looking at a cross long double == double == 8 bytes. Again I'm expecting a target issue here. The rev. made a difference in inlining because it removes less stores as redundant during early optimizations. testN is no longer inlined.
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 --- Comment #2 from Christophe Lyon --- Removing the test*() calls from the end, the first failing one is testX(). However, if I remove all the preceding ones, the test passes. Using -fwhole-program instead of -flto has no effect: the test still fails. Adding a printf() call in check() also makes the test pass.
[Bug tree-optimization/91109] [10 regression][arm] gcc.c-torture/execute/20040709-1.c fails since r273135
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91109 Richard Biener changed: What|Removed |Added CC||rguenth at gcc dot gnu.org Target Milestone|--- |10.0 --- Comment #1 from Richard Biener --- Can you help and check which test* () call fails? Also check whether -fwhole-program instead of -flto makes it fail. Does it still fail when you comment all but the failing test* () call?