[Bug target/110533] [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110533 --- Comment #5 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:8125b12f846b41f26e58c0fe3b218d654f65d1c8 commit r14-2730-g8125b12f846b41f26e58c0fe3b218d654f65d1c8 Author: Roger Sayle Date: Sat Jul 22 21:52:55 2023 +0100 i386: Don't use insvti_{high,low}part with -O0 (for compile-time). This patch attempts to help with PR rtl-optimization/110587, a regression of -O0 compile time for the pathological pr28071.c. My recent patch helps a bit, but hasn't returned -O0 compile-time to where it was before my ix86_expand_move changes. The obvious solution/workaround is to guard these new TImode parameter passing optimizations with "&& optimize", so they don't trigger when compiling with -O0. The very minor complication is that "&& optimize" alone leads to the regression of pr110533.c, where our improved TImode parameter passing fixes a wrong-code issue with naked functions, importantly, when compiling with -O0. This should explain the one line fix below "&& (optimize || ix86_function_naked (cfun))". I've an additional fix/tweak or two for this compile-time issue, but this change eliminates the part of the regression that I've caused. 2023-07-22 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_move): Disable the 64-bit insertions into TImode optimizations with -O0, unless the function has the "naked" attribute (for PR target/110533).
[Bug target/110533] [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110533 --- Comment #4 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:bdf2737cda53a83332db1a1a021653447b05a7e7 commit r14-2386-gbdf2737cda53a83332db1a1a021653447b05a7e7 Author: Roger Sayle Date: Fri Jul 7 20:39:58 2023 +0100 i386: Improve __int128 argument passing (in ix86_expand_move). Passing 128-bit integer (TImode) parameters on x86_64 can sometimes result in surprising code. Consider the example below (from PR 43644): unsigned __int128 foo(unsigned __int128 x, unsigned long long y) { return x+y; } which currently results in 6 consecutive movq instructions: foo:movq%rsi, %rax movq%rdi, %rsi movq%rdx, %rcx movq%rax, %rdi movq%rsi, %rax movq%rdi, %rdx addq%rcx, %rax adcq$0, %rdx ret The underlying issue is that during RTL expansion, we generate the following initial RTL for the x argument: (insn 4 3 5 2 (set (reg:TI 85) (subreg:TI (reg:DI 86) 0)) "pr43644-2.c":5:1 -1 (nil)) (insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8) (reg:DI 87)) "pr43644-2.c":5:1 -1 (nil)) (insn 6 5 7 2 (set (reg/v:TI 84 [ x ]) (reg:TI 85)) "pr43644-2.c":5:1 -1 (nil)) which by combine/reload becomes (insn 25 3 22 2 (set (reg/v:TI 84 [ x ]) (const_int 0 [0])) "pr43644-2.c":5:1 -1 (nil)) (insn 22 25 23 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 0) (reg:DI 93)) "pr43644-2.c":5:1 90 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 93) (nil))) (insn 23 22 28 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 8) (reg:DI 94)) "pr43644-2.c":5:1 90 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 94) (nil))) where the heavy use of SUBREG SET_DESTs creates challenges for both combine and register allocation. The improvement proposed here is to avoid these problematic SUBREGs by adding (two) special cases to ix86_expand_move. For insn 4, which sets a TImode destination from a paradoxical SUBREG, to assign the lowpart, we can use an explicit zero extension (zero_extendditi2 was added in July 2022), and for insn 5, which sets the highpart of a TImode register we can use the *insvti_highpart_1 instruction (that was added in May 2023, after being approved for stage1 in January). This allows combine to work its magic, merging these insns into a *concatditi3 and from there into other optimized forms. So for the test case above, we now generate only a single movq: foo:movq%rdx, %rax xorl%edx, %edx addq%rdi, %rax adcq%rsi, %rdx ret But there is a little bad news. This patch causes two (minor) missed optimization regressions on x86_64; gcc.target/i386/pr82580.c and gcc.target/i386/pr91681-1.c. As shown in the test case above, we're no longer generating adcq $0, but instead using xorl. For the other FAIL, register allocation now has more freedom and is (arbitrarily) choosing a register assignment that doesn't match what the test is expecting. These issues are easier to explain and fix once this patch is in the tree. The good news is that this approach fixes a number of long standing issues, that need to checked in bugzilla, including PR target/110533 which was just opened/reported earlier this week. 2023-07-07 Roger Sayle gcc/ChangeLog PR target/43644 PR target/110533 * config/i386/i386-expand.cc (ix86_expand_move): Convert SETs of TImode destinations from paradoxical SUBREGs (setting the lowpart) into explicit zero extensions. Use *insvti_highpart_1 instruction to set the highpart of a TImode destination. gcc/testsuite/ChangeLog PR target/43644 PR target/110533 * gcc.target/i386/pr110533.c: New test case. * gcc.target/i386/pr43644-2.c: Likewise.
[Bug target/110533] [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110533 Roger Sayle changed: What|Removed |Added Last reconfirmed||2023-07-06 Ever confirmed|0 |1 CC||roger at nextmovesoftware dot com Status|UNCONFIRMED |NEW --- Comment #3 from Roger Sayle --- The patch recently proposed at https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623756.html would resolve this issue.
[Bug target/110533] [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110533 --- Comment #2 from Uroš Bizjak --- (In reply to Andrew Pinski from comment #1) > >clobbering other parameters and callee-saved registers. > > > (insn 2 8 3 2 (set (reg:DI 84) > (reg:DI 5 di [ aD.2522 ])) "/app/example.cpp":3:25 -1 > (nil)) > (insn 3 2 4 2 (set (reg:DI 85) > (reg:DI 4 si [ aD.2522+8 ])) "/app/example.cpp":3:25 -1 > (nil)) > (insn 4 3 5 2 (set (reg:TI 83) > (subreg:TI (reg:DI 84) 0)) "/app/example.cpp":3:25 -1 > (nil)) > (insn 5 4 6 2 (set (subreg:DI (reg:TI 83) 8) > (reg:DI 85)) "/app/example.cpp":3:25 -1 > (nil)) > (insn 6 5 7 2 (set (reg/v:TI 82 [ aD.2522 ]) > (reg:TI 83)) "/app/example.cpp":3:25 -1 > (nil)) This is emitted by middle-end to reconstruct a function argument in cases when argument is passed via multiple registers. Function argument is specified by the same hook that is used for the caller and the callee, so it is not possible to simple disable it for naked functions. When -O2 is used, optimizers figure out that the reconstructed value is unused and remove the whole reconstruction sequence. This is unfortunately not the case with -O0. So, middle-end should provide some sort of mechanism to suppress the generation of reconstruction sequence. The above sequence is emitted in function.cc/assign_parm_remove_parallels, but similar functionality can probably be found elsewhere in the function handling code. Or only pass simple arguments to naked function. Naked functions are specialist's tool, not intended for "general public".
[Bug target/110533] [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110533 --- Comment #1 from Andrew Pinski --- >clobbering other parameters and callee-saved registers. (insn 2 8 3 2 (set (reg:DI 84) (reg:DI 5 di [ aD.2522 ])) "/app/example.cpp":3:25 -1 (nil)) (insn 3 2 4 2 (set (reg:DI 85) (reg:DI 4 si [ aD.2522+8 ])) "/app/example.cpp":3:25 -1 (nil)) (insn 4 3 5 2 (set (reg:TI 83) (subreg:TI (reg:DI 84) 0)) "/app/example.cpp":3:25 -1 (nil)) (insn 5 4 6 2 (set (subreg:DI (reg:TI 83) 8) (reg:DI 85)) "/app/example.cpp":3:25 -1 (nil)) (insn 6 5 7 2 (set (reg/v:TI 82 [ aD.2522 ]) (reg:TI 83)) "/app/example.cpp":3:25 -1 (nil))