[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped

2021-05-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped

2017-06-26 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856

Segher Boessenkool  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW

--- Comment #6 from Segher Boessenkool  ---
There now is generic code (in trunk and 7) for -fshrink-wrap-separate;
for this to do anything on x86, someone who understands the i386 backend
will have to write an implementation for the hooks.

[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped

2016-09-11 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
 CC||pinskia at gcc dot gnu.org
   Severity|normal  |enhancement

[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped

2016-03-02 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org

[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped

2015-10-06 Thread luto at mit dot edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856

--- Comment #4 from Andy Lutomirski  ---
I don't want to comment on how code generation works in GCC, but in terms of
what works in the output:

x86_64 generally has a 16-byte stack alignment in user code, which is two
slots.  (In the kernel, we use 8-byte alignment, since we don't use SSE/AVX.) 
This means that the stack is always aligned appropriately.

For builds with frame pointers on, merely pushing %rbp aligns the stack, so
splitting out the 'push %rbp' from the rest of the pushes doesn't leave an
unaligned window.

With frame pointers off, doing any odd number of pushes will similarly align
the stack.

For functions in which there's a control flow path from the beginning to the
end that call nothing, then the alignment is irrelevant unless there's a
16-byte or higher aligned live variable on the stack.


[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped

2015-10-06 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856

--- Comment #5 from Segher Boessenkool  ---
The prologue does a lot of separate things:
- Save non-volatile registers;
- Do whatever needs to be done to be able to call things (save the
return address, align the stack, whatever; different per target);
- Set up a stack frame;
- Do whatever needs to be done for the static chain;
- Set up registers for PIC;
- Etc.

Not all of those can be separated for every target.  There also is
a required ordering between them, different per target as well.
Doing multiple of those together may be cheaper as well (say, pushing
registers to set up the stack frame or aligned stack).

GCC does not yet have any way to ask the backend to split the prologue
into such separate pieces.  It isn't clear to me what a good interface
would be.

Relatedly, there are cases where it would be useful to insert (pieces
of) the prologue at multiple points, not at a common dominator of all
that need it.  For example, a function that requires no prologue at
all except for no-return error paths (that need the backchain saved
for backtracing to work).  What would be good heuristics for that is
unclear as well.


[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped

2015-10-05 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-10-05
 CC||segher at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Segher Boessenkool  ---
The call to "a" needs the prologue, maybe to align the stack?


[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped

2015-10-05 Thread luto at mit dot edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856

--- Comment #2 from Andy Lutomirski  ---
(In reply to Segher Boessenkool from comment #1)
> The call to "a" needs the prologue, maybe to align the stack?

The "subq $8, %rsp" is for stack alignment, and whether it's emitted depends on
the parity of the number of pushes.  I have no problem with it.

The problem is that rbx and rbp are pushed.  They shouldn't be.  In a real
function, it's worse: rbx, rbp, r12, r13, r14, and r15 all get pushed
unnecessarily.  I just had a bunch of fun refactoring my big Linux entry code
rewrite to minimize the amount of pain that this issue causes.


[Bug rtl-optimization/67856] callee-saved register saves should be shrink-wrapped

2015-10-05 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67856

--- Comment #3 from Segher Boessenkool  ---
Currently, shrink-wrapping does not allow any call to happen without
prologue (see shrink-wrap.c:requires_stack_frame_p).

On x86-64, if you do not have a prologue but do do a call, the called
function will be entered with unexpected stack alignment, as far as I
can see?

Letting shrink-wrapping do the non-volatile register save separately
from the other things the prologue does requires to first have the
prologue split into parts each doing one thing (and epilogue too).