[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED
[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856 Segher Boessenkool changed: What|Removed |Added Status|ASSIGNED|NEW --- Comment #6 from Segher Boessenkool --- There now is generic code (in trunk and 7) for -fshrink-wrap-separate; for this to do anything on x86, someone who understands the i386 backend will have to write an implementation for the hooks.
[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization CC||pinskia at gcc dot gnu.org Severity|normal |enhancement
[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856 Segher Boessenkool changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |segher at gcc dot gnu.org
[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856 --- Comment #4 from Andy Lutomirski --- I don't want to comment on how code generation works in GCC, but in terms of what works in the output: x86_64 generally has a 16-byte stack alignment in user code, which is two slots. (In the kernel, we use 8-byte alignment, since we don't use SSE/AVX.) This means that the stack is always aligned appropriately. For builds with frame pointers on, merely pushing %rbp aligns the stack, so splitting out the 'push %rbp' from the rest of the pushes doesn't leave an unaligned window. With frame pointers off, doing any odd number of pushes will similarly align the stack. For functions in which there's a control flow path from the beginning to the end that call nothing, then the alignment is irrelevant unless there's a 16-byte or higher aligned live variable on the stack.
[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856 --- Comment #5 from Segher Boessenkool --- The prologue does a lot of separate things: - Save non-volatile registers; - Do whatever needs to be done to be able to call things (save the return address, align the stack, whatever; different per target); - Set up a stack frame; - Do whatever needs to be done for the static chain; - Set up registers for PIC; - Etc. Not all of those can be separated for every target. There also is a required ordering between them, different per target as well. Doing multiple of those together may be cheaper as well (say, pushing registers to set up the stack frame or aligned stack). GCC does not yet have any way to ask the backend to split the prologue into such separate pieces. It isn't clear to me what a good interface would be. Relatedly, there are cases where it would be useful to insert (pieces of) the prologue at multiple points, not at a common dominator of all that need it. For example, a function that requires no prologue at all except for no-return error paths (that need the backchain saved for backtracing to work). What would be good heuristics for that is unclear as well.
[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856 Segher Boessenkool changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-10-05 CC||segher at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Segher Boessenkool --- The call to "a" needs the prologue, maybe to align the stack?
[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856 --- Comment #2 from Andy Lutomirski --- (In reply to Segher Boessenkool from comment #1) > The call to "a" needs the prologue, maybe to align the stack? The "subq $8, %rsp" is for stack alignment, and whether it's emitted depends on the parity of the number of pushes. I have no problem with it. The problem is that rbx and rbp are pushed. They shouldn't be. In a real function, it's worse: rbx, rbp, r12, r13, r14, and r15 all get pushed unnecessarily. I just had a bunch of fun refactoring my big Linux entry code rewrite to minimize the amount of pain that this issue causes.
[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856 --- Comment #3 from Segher Boessenkool --- Currently, shrink-wrapping does not allow any call to happen without prologue (see shrink-wrap.c:requires_stack_frame_p). On x86-64, if you do not have a prologue but do do a call, the called function will be entered with unexpected stack alignment, as far as I can see? Letting shrink-wrapping do the non-volatile register save separately from the other things the prologue does requires to first have the prologue split into parts each doing one thing (and epilogue too).