[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 --- Comment #18 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:4b5ef857f5faf09f274c0a95c67faaa80d198124 commit r13-6894-g4b5ef857f5faf09f274c0a95c67faaa80d198124 Author: Jakub Jelinek Date: Tue Mar 28 10:43:08 2023 +0200 i386: Require just 32-bit alignment for SLOT_FLOATxFDI_387 -m32 -mpreferred-stack-boundary=2 DImode temporaries [PR109276] The following testcase ICEs since r11-2259 because assign_386_stack_local -> assign_stack_local -> ix86_local_alignment now uses 64-bit alignment for DImode temporaries rather than 32-bit as before. Most of the spots in the backend which ask for DImode temporaries are during expansion and those apparently are handled fine with -m32 -mpreferred-stack-boundary=2, we dynamically realign the stack in that case (most of the spots actually don't need that alignment but at least one does), then 2 spots are in STV which I assume also work correctly. But during splitting we can create a DImode slot which doesn't need to be 64-bit alignment (it is nicer for performance though), when we apparently aren't able to detect it for dynamic stack realignment purposes. The following patch just makes the slot 32-bit aligned in that rare case. 2023-03-28 Jakub Jelinek PR target/109276 * config/i386/i386.cc (assign_386_stack_local): For DImode with SLOT_FLOATxFDI_387 and -m32 -mpreferred-stack-boundary=2 pass align 32 rather than 0 to assign_stack_local. * gcc.target/i386/pr109276.c: New test.
[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 --- Comment #17 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #16) > (In reply to Uroš Bizjak from comment #15) > > (In reply to Jakub Jelinek from comment #13) > > > asks for a DImode stack slot, ix86_local_alignment newly doesn't lower the > > > alignment > > > which isn't good for -mpreferred-stack-boundary=2. > > > > IIRC, DImode FILD/FIST(P) instructions execute faster when memory is aligned > > to 64 bits. *If* it is possible, 64 bit alignment is preferred. > > Well, user who uses -mpreferred-stack-boundary=2 probably doesn't care about > performance that much. I guess I could also check whether we've already > decided to dynamically reallocate the stack for some other reasons and only > use 32-bit alignment if we haven't. > But there are too many such flags, so not sure what to check. No need to complicate things too much, I agree that by using -mpreferred-stack-boundary=2 user made a compromise w.r.t. performance.
[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 --- Comment #16 from Jakub Jelinek --- (In reply to Uroš Bizjak from comment #15) > (In reply to Jakub Jelinek from comment #13) > > asks for a DImode stack slot, ix86_local_alignment newly doesn't lower the > > alignment > > which isn't good for -mpreferred-stack-boundary=2. > > IIRC, DImode FILD/FIST(P) instructions execute faster when memory is aligned > to 64 bits. *If* it is possible, 64 bit alignment is preferred. Well, user who uses -mpreferred-stack-boundary=2 probably doesn't care about performance that much. I guess I could also check whether we've already decided to dynamically reallocate the stack for some other reasons and only use 32-bit alignment if we haven't. But there are too many such flags, so not sure what to check.
[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 --- Comment #15 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #13) > asks for a DImode stack slot, ix86_local_alignment newly doesn't lower the > alignment > which isn't good for -mpreferred-stack-boundary=2. IIRC, DImode FILD/FIST(P) instructions execute faster when memory is aligned to 64 bits. *If* it is possible, 64 bit alignment is preferred.
[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 Jakub Jelinek changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #14 from Jakub Jelinek --- Created attachment 54769 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54769&action=edit gcc13-pr109276.patch So, I wonder if for the local DImode temporaries we just couldn't ask for 32-bit alignment with -m32 -mpreferred-stack-alignment=2. I went through all the assign_386_stack_local calls which do or can pass DImode to it: - 2 spots in sync.md, both for a MEM scratch which I don't see a reason why it couldn't be just 32-bit aligned, they are during expansion, SLOT_TEMP - the SLOT_FLOATxFDI_387 use in the splitter used in this PR, that one also looks ok to be 32-bit aligned, during splitting - floatunssi2 also during expansion, SLOT_TEMP, this one I'm afraid really wants the slot to be 64-bit aligned - in ix86_expand_divmod_libfunc SLOT_TEMP, also during expansion, I think that one would be ok to use 32-bit aligned - scalar_chain::make_vector_copies and scalar_chain::convert_reg, during STV, SLOT_STV_TEMP, I think these want 64-bit alignment
[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 --- Comment #13 from Jakub Jelinek --- Neither the new ix86_lower_local_decl_alignment function is called, nor SET_DECL_ALIGN in pass_adjust_alignment::execute. What happens is that #0 ix86_local_alignment (exp=, mode=E_DImode, align=64, may_lower=false) at ../../gcc/config/i386/i386.cc:17499 #1 0x00883aab in get_stack_local_alignment (type=, mode=E_DImode) at ../../gcc/function.cc:290 #2 0x00883d8c in assign_stack_local_1 (mode=E_DImode, size=..., align=0, kind=2) at ../../gcc/function.cc:388 #3 0x0088457b in assign_stack_local (mode=E_DImode, size=..., align=0) at ../../gcc/function.cc:547 #4 0x0120a3e7 in assign_386_stack_local (mode=E_DImode, n=SLOT_FLOATxFDI_387) at ../../gcc/config/i386/i386.cc:16688 #5 0x01b3c96f in gen_split_72 (curr_insn=0x7fffea1452c0, operands=0x35183e0 ) at ../../gcc/config/i386/i386.md:5723 #6 0x021b7b7c in split_9 (x1=0x7fffea2fb6d8, insn=0x7fffea1452c0) at ../../gcc/config/i386/i386.md:5722 #7 0x0229f190 in split_insns (x1=0x7fffea2fb6d8, insn=0x7fffea1452c0) at ../../gcc/config/i386/i386.md:16795 #8 0x0079c159 in try_split (pat=0x7fffea2fb6d8, trial=0x7fffea1452c0, last=1) at ../../gcc/emit-rtl.cc:3799 asks for a DImode stack slot, ix86_local_alignment newly doesn't lower the alignment which isn't good for -mpreferred-stack-boundary=2.
[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 --- Comment #12 from Jakub Jelinek --- The above testcase only started ICEing with r11-2259-g0a9d711df36b42b6494b73 , so I think the IPA pass is innocent here.
[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 --- Comment #11 from Jakub Jelinek --- Reduced testcase: /* PR target/109276 */ /* { dg-do compile } */ /* { dg-options "-march=x86-64" } */ /* { dg-additional-options "-mpreferred-stack-boundary=2" } */ long long a; long double b; void foo (void) { b += a; }
[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 Richard Biener changed: What|Removed |Added Priority|P3 |P2
[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 --- Comment #10 from H.J. Lu --- (In reply to Jakub Jelinek from comment #7) > This started to ICE with r11-508-gdfa4fcdba374ed44 in the newly added pass > there, > and since r11-2259-g0a9d711df36b42b6494b73 it ICEs similarly like on the > trunk. The new pass should just stack alignment requirements when alignment of stack variable is increased.
[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2023-03-24 --- Comment #9 from Andrew Pinski --- Confirmed.
[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 --- Comment #8 from Andrew Pinski --- A workaround is to do: long long int digit __attribute__((aligned(8))); Note manually setting the digit alignment to 4 does not cause an ICE for GCC 10.4 either.
[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 Jakub Jelinek changed: What|Removed |Added CC||hjl.tools at gmail dot com, ||jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- This started to ICE with r11-508-gdfa4fcdba374ed44 in the newly added pass there, and since r11-2259-g0a9d711df36b42b6494b73 it ICEs similarly like on the trunk.
[Bug target/109276] [11/12/13 Regression] ICE: in assign_stack_local_1, at function.cc:429 with -mpreferred-stack-boundary=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109276 Andrew Pinski changed: What|Removed |Added Known to fail||11.1.0, 13.0 Summary|ICE: in |[11/12/13 Regression] ICE: |assign_stack_local_1, at|in assign_stack_local_1, at |function.cc:429 with|function.cc:429 with |-mpreferred-stack-boundary= |-mpreferred-stack-boundary= |2 |2 Target Milestone|--- |11.4 Keywords||ice-on-valid-code Known to work||10.1.0, 10.4.0, 9.5.0