Re: PING: [Updated, PATCH] i386: Avoid stack realignment if possible

2017-09-05 Thread H.J. Lu
On Fri, Sep 1, 2017 at 11:48 AM, H.J. Lu  wrote:
> On Sun, Aug 13, 2017 at 3:02 PM, H.J. Lu  wrote:
>> On Mon, Aug 07, 2017 at 08:58:49AM -0700, H.J. Lu wrote:
>>> On Tue, Jul 25, 2017 at 7:54 AM, Uros Bizjak  wrote:
>>> > On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu  wrote:
>>> >> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu  wrote:
>>> >>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
>>>  On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
>>> > On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  
>>> > wrote:
>>> > > Hi!
>>> > >
>>> > > Honza recently changed the i?86 backend, so that it often doesn't
>>> > > do -maccumulate-outgoing-args by default on x86_64.
>>> > > Unfortunately, on some of the here included testcases this regressed
>>> > > quite a bit the generated code.  As AVX vectors are used, the 
>>> > > dynamic
>>> > > realignment code needs to assume e.g. that some of them will need 
>>> > > to be
>>> > > spilled, and for -mno-accumulate-outgoing-args the code needs to set
>>> > > need_drap early as well.  But in when emitting the 
>>> > > prologue/epilogue,
>>> > > if need_drap is set, we don't perform the optimization for leaf 
>>> > > functions
>>> > > which have zero size stack frame, thus we end up with uselessly 
>>> > > doing
>>> > > dynamic stack realignment, setting up DRAP that nothing uses and 
>>> > > later on
>>> > > restore everything back.
>>> > >
>>> > > This patch improves it, if the DRAP register isn't live at the 
>>> > > start of
>>> > > entry bb successor and we aren't going to realign the stack, we 
>>> > > don't
>>> > > need DRAP at all, and even if we need DRAP register, that can't be 
>>> > > the sole
>>> > > reason for doing stack realignment, the prologue code is able to 
>>> > > set up DRAP
>>> > > even without dynamic stack realignment.
>>> > >
>>> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>> > >
>>> > > 2013-12-20  Jakub Jelinek  
>>> > >
>>> > > PR target/59501
>>> > > * config/i386/i386.c (ix86_save_reg): Don't return true for 
>>> > > drap_reg
>>> > > if !crtl->stack_realign_needed.
>>> > > (ix86_finalize_stack_realign_flags): If drap_reg isn't live 
>>> > > on entry
>>> > > and stack_realign_needed will be false, clear drap_reg and 
>>> > > need_drap.
>>> > > Optimize leaf functions that don't need stack frame even if
>>> > > crtl->need_drap.
>>> > >
>>> > > * gcc.target/i386/pr59501-1.c: New test.
>>> > > * gcc.target/i386/pr59501-1a.c: New test.
>>> > > * gcc.target/i386/pr59501-2.c: New test.
>>> > > * gcc.target/i386/pr59501-2a.c: New test.
>>> > > * gcc.target/i386/pr59501-3.c: New test.
>>> > > * gcc.target/i386/pr59501-3a.c: New test.
>>> > > * gcc.target/i386/pr59501-4.c: New test.
>>> > > * gcc.target/i386/pr59501-4a.c: New test.
>>> > > * gcc.target/i386/pr59501-5.c: New test.
>>> > > * gcc.target/i386/pr59501-6.c: New test.
>>> >
>>> > LGTM, assuming Jakub is OK with the patch.
>>> >
>>> > Thanks,
>>> > Uros.
>>>
>>> Jakub, can you take a look at this:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html
>>>
>>
>> Here is the updated patch to fix
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81769
>>
>> OK for trunk?
>>
>> Thanks.
>>
>> H.J.
>> ---
>> ix86_finalize_stack_frame_flags has been extended to eliminate frame
>> pointer when the new stack frame isn't needed with and without
>> -maccumulate-outgoing-args as well as -fomit-frame-pointer.  Since stack
>> access with larger alignment may be optimized out, to decide if stack
>> realignment is needed, we need to not only check for stack frame access,
>> but also verify the alignment of stack frame access.  Since alignment of
>> memory access via arg_pointer is set up by caller, not by callee, we
>> should find the maximum stack alignment from the stack frame access
>> instructions via stack pointer and frame pointrer to avoid stack
>> realignment when stack alignment needed is less than incoming stack
>> boundary.
>>
>> gcc/
>>
>> PR target/59501
>> PR target/81624
>> PR target/81769
>> * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't
>> realign stack if stack alignment needed is less than incoming
>> stack boundary.
>>
>> gcc/testsuite/
>>
>> PR target/59501
>> PR target/81624
>> PR target/81769
>> * gcc.target/i386/pr59501-4a.c: Remove xfail.
>> * gcc.target/i386/pr81769-1a.c: New test.
>> * gcc.target/i386/pr81769-1b.c: 

PING: [Updated, PATCH] i386: Avoid stack realignment if possible

2017-09-01 Thread H.J. Lu
On Sun, Aug 13, 2017 at 3:02 PM, H.J. Lu  wrote:
> On Mon, Aug 07, 2017 at 08:58:49AM -0700, H.J. Lu wrote:
>> On Tue, Jul 25, 2017 at 7:54 AM, Uros Bizjak  wrote:
>> > On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu  wrote:
>> >> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu  wrote:
>> >>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
>>  On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
>> > On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  
>> > wrote:
>> > > Hi!
>> > >
>> > > Honza recently changed the i?86 backend, so that it often doesn't
>> > > do -maccumulate-outgoing-args by default on x86_64.
>> > > Unfortunately, on some of the here included testcases this regressed
>> > > quite a bit the generated code.  As AVX vectors are used, the dynamic
>> > > realignment code needs to assume e.g. that some of them will need to 
>> > > be
>> > > spilled, and for -mno-accumulate-outgoing-args the code needs to set
>> > > need_drap early as well.  But in when emitting the prologue/epilogue,
>> > > if need_drap is set, we don't perform the optimization for leaf 
>> > > functions
>> > > which have zero size stack frame, thus we end up with uselessly doing
>> > > dynamic stack realignment, setting up DRAP that nothing uses and 
>> > > later on
>> > > restore everything back.
>> > >
>> > > This patch improves it, if the DRAP register isn't live at the start 
>> > > of
>> > > entry bb successor and we aren't going to realign the stack, we don't
>> > > need DRAP at all, and even if we need DRAP register, that can't be 
>> > > the sole
>> > > reason for doing stack realignment, the prologue code is able to set 
>> > > up DRAP
>> > > even without dynamic stack realignment.
>> > >
>> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> > >
>> > > 2013-12-20  Jakub Jelinek  
>> > >
>> > > PR target/59501
>> > > * config/i386/i386.c (ix86_save_reg): Don't return true for 
>> > > drap_reg
>> > > if !crtl->stack_realign_needed.
>> > > (ix86_finalize_stack_realign_flags): If drap_reg isn't live 
>> > > on entry
>> > > and stack_realign_needed will be false, clear drap_reg and 
>> > > need_drap.
>> > > Optimize leaf functions that don't need stack frame even if
>> > > crtl->need_drap.
>> > >
>> > > * gcc.target/i386/pr59501-1.c: New test.
>> > > * gcc.target/i386/pr59501-1a.c: New test.
>> > > * gcc.target/i386/pr59501-2.c: New test.
>> > > * gcc.target/i386/pr59501-2a.c: New test.
>> > > * gcc.target/i386/pr59501-3.c: New test.
>> > > * gcc.target/i386/pr59501-3a.c: New test.
>> > > * gcc.target/i386/pr59501-4.c: New test.
>> > > * gcc.target/i386/pr59501-4a.c: New test.
>> > > * gcc.target/i386/pr59501-5.c: New test.
>> > > * gcc.target/i386/pr59501-6.c: New test.
>> >
>> > LGTM, assuming Jakub is OK with the patch.
>> >
>> > Thanks,
>> > Uros.
>>
>> Jakub, can you take a look at this:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html
>>
>
> Here is the updated patch to fix
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81769
>
> OK for trunk?
>
> Thanks.
>
> H.J.
> ---
> ix86_finalize_stack_frame_flags has been extended to eliminate frame
> pointer when the new stack frame isn't needed with and without
> -maccumulate-outgoing-args as well as -fomit-frame-pointer.  Since stack
> access with larger alignment may be optimized out, to decide if stack
> realignment is needed, we need to not only check for stack frame access,
> but also verify the alignment of stack frame access.  Since alignment of
> memory access via arg_pointer is set up by caller, not by callee, we
> should find the maximum stack alignment from the stack frame access
> instructions via stack pointer and frame pointrer to avoid stack
> realignment when stack alignment needed is less than incoming stack
> boundary.
>
> gcc/
>
> PR target/59501
> PR target/81624
> PR target/81769
> * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't
> realign stack if stack alignment needed is less than incoming
> stack boundary.
>
> gcc/testsuite/
>
> PR target/59501
> PR target/81624
> PR target/81769
> * gcc.target/i386/pr59501-4a.c: Remove xfail.
> * gcc.target/i386/pr81769-1a.c: New test.
> * gcc.target/i386/pr81769-1b.c: Likewise.
> * gcc.target/i386/pr81769-2.c: Likewise.
> ---
>  gcc/config/i386/i386.c | 143 
> ++---
>  gcc/testsuite/gcc.target/i386/pr59501-4a.c |   2 +-
>  

[Updated, PATCH] i386: Avoid stack realignment if possible

2017-08-13 Thread H.J. Lu
On Mon, Aug 07, 2017 at 08:58:49AM -0700, H.J. Lu wrote:
> On Tue, Jul 25, 2017 at 7:54 AM, Uros Bizjak  wrote:
> > On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu  wrote:
> >> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu  wrote:
> >>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
>  On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
> > On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  wrote:
> > > Hi!
> > >
> > > Honza recently changed the i?86 backend, so that it often doesn't
> > > do -maccumulate-outgoing-args by default on x86_64.
> > > Unfortunately, on some of the here included testcases this regressed
> > > quite a bit the generated code.  As AVX vectors are used, the dynamic
> > > realignment code needs to assume e.g. that some of them will need to 
> > > be
> > > spilled, and for -mno-accumulate-outgoing-args the code needs to set
> > > need_drap early as well.  But in when emitting the prologue/epilogue,
> > > if need_drap is set, we don't perform the optimization for leaf 
> > > functions
> > > which have zero size stack frame, thus we end up with uselessly doing
> > > dynamic stack realignment, setting up DRAP that nothing uses and 
> > > later on
> > > restore everything back.
> > >
> > > This patch improves it, if the DRAP register isn't live at the start 
> > > of
> > > entry bb successor and we aren't going to realign the stack, we don't
> > > need DRAP at all, and even if we need DRAP register, that can't be 
> > > the sole
> > > reason for doing stack realignment, the prologue code is able to set 
> > > up DRAP
> > > even without dynamic stack realignment.
> > >
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > >
> > > 2013-12-20  Jakub Jelinek  
> > >
> > > PR target/59501
> > > * config/i386/i386.c (ix86_save_reg): Don't return true for 
> > > drap_reg
> > > if !crtl->stack_realign_needed.
> > > (ix86_finalize_stack_realign_flags): If drap_reg isn't live 
> > > on entry
> > > and stack_realign_needed will be false, clear drap_reg and 
> > > need_drap.
> > > Optimize leaf functions that don't need stack frame even if
> > > crtl->need_drap.
> > >
> > > * gcc.target/i386/pr59501-1.c: New test.
> > > * gcc.target/i386/pr59501-1a.c: New test.
> > > * gcc.target/i386/pr59501-2.c: New test.
> > > * gcc.target/i386/pr59501-2a.c: New test.
> > > * gcc.target/i386/pr59501-3.c: New test.
> > > * gcc.target/i386/pr59501-3a.c: New test.
> > > * gcc.target/i386/pr59501-4.c: New test.
> > > * gcc.target/i386/pr59501-4a.c: New test.
> > > * gcc.target/i386/pr59501-5.c: New test.
> > > * gcc.target/i386/pr59501-6.c: New test.
> >
> > LGTM, assuming Jakub is OK with the patch.
> >
> > Thanks,
> > Uros.
> 
> Jakub, can you take a look at this:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html
> 

Here is the updated patch to fix

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81769

OK for trunk?

Thanks.

H.J.
---
ix86_finalize_stack_frame_flags has been extended to eliminate frame
pointer when the new stack frame isn't needed with and without
-maccumulate-outgoing-args as well as -fomit-frame-pointer.  Since stack
access with larger alignment may be optimized out, to decide if stack
realignment is needed, we need to not only check for stack frame access,
but also verify the alignment of stack frame access.  Since alignment of
memory access via arg_pointer is set up by caller, not by callee, we
should find the maximum stack alignment from the stack frame access
instructions via stack pointer and frame pointrer to avoid stack
realignment when stack alignment needed is less than incoming stack
boundary.

gcc/

PR target/59501
PR target/81624
PR target/81769
* config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't
realign stack if stack alignment needed is less than incoming
stack boundary.

gcc/testsuite/

PR target/59501
PR target/81624
PR target/81769
* gcc.target/i386/pr59501-4a.c: Remove xfail.
* gcc.target/i386/pr81769-1a.c: New test.
* gcc.target/i386/pr81769-1b.c: Likewise.
* gcc.target/i386/pr81769-2.c: Likewise.
---
 gcc/config/i386/i386.c | 143 ++---
 gcc/testsuite/gcc.target/i386/pr59501-4a.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr81769-1a.c |  21 +
 gcc/testsuite/gcc.target/i386/pr81769-1b.c |   7 ++
 gcc/testsuite/gcc.target/i386/pr81769-2.c  |  21 +
 5 files changed, 138 insertions(+), 56 deletions(-)
 create mode 100644 

PING^2: [PATCH] i386: Avoid stack realignment if possible

2017-08-07 Thread H.J. Lu
On Tue, Jul 25, 2017 at 7:54 AM, Uros Bizjak  wrote:
> On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu  wrote:
>> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu  wrote:
>>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
 On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
> On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  wrote:
> > Hi!
> >
> > Honza recently changed the i?86 backend, so that it often doesn't
> > do -maccumulate-outgoing-args by default on x86_64.
> > Unfortunately, on some of the here included testcases this regressed
> > quite a bit the generated code.  As AVX vectors are used, the dynamic
> > realignment code needs to assume e.g. that some of them will need to be
> > spilled, and for -mno-accumulate-outgoing-args the code needs to set
> > need_drap early as well.  But in when emitting the prologue/epilogue,
> > if need_drap is set, we don't perform the optimization for leaf 
> > functions
> > which have zero size stack frame, thus we end up with uselessly doing
> > dynamic stack realignment, setting up DRAP that nothing uses and later 
> > on
> > restore everything back.
> >
> > This patch improves it, if the DRAP register isn't live at the start of
> > entry bb successor and we aren't going to realign the stack, we don't
> > need DRAP at all, and even if we need DRAP register, that can't be the 
> > sole
> > reason for doing stack realignment, the prologue code is able to set up 
> > DRAP
> > even without dynamic stack realignment.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2013-12-20  Jakub Jelinek  
> >
> > PR target/59501
> > * config/i386/i386.c (ix86_save_reg): Don't return true for 
> > drap_reg
> > if !crtl->stack_realign_needed.
> > (ix86_finalize_stack_realign_flags): If drap_reg isn't live on 
> > entry
> > and stack_realign_needed will be false, clear drap_reg and 
> > need_drap.
> > Optimize leaf functions that don't need stack frame even if
> > crtl->need_drap.
> >
> > * gcc.target/i386/pr59501-1.c: New test.
> > * gcc.target/i386/pr59501-1a.c: New test.
> > * gcc.target/i386/pr59501-2.c: New test.
> > * gcc.target/i386/pr59501-2a.c: New test.
> > * gcc.target/i386/pr59501-3.c: New test.
> > * gcc.target/i386/pr59501-3a.c: New test.
> > * gcc.target/i386/pr59501-4.c: New test.
> > * gcc.target/i386/pr59501-4a.c: New test.
> > * gcc.target/i386/pr59501-5.c: New test.
> > * gcc.target/i386/pr59501-6.c: New test.
>
> LGTM, assuming Jakub is OK with the patch.
>
> Thanks,
> Uros.

Jakub, can you take a look at this:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html

Thanks.
>
> >
> > --- gcc/testsuite/gcc.target/i386/pr59501-4a.c.jj   2013-12-20 
> > 12:19:20.603212859 +0100
> > +++ gcc/testsuite/gcc.target/i386/pr59501-4a.c  2013-12-20 
> > 12:23:33.647881672 +0100
> > @@ -0,0 +1,8 @@
> > +/* PR target/59501 */
> > +/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-options "-O2 -mavx -maccumulate-outgoing-args" } */
> > +
> > +#include "pr59501-3a.c"
> > +
> > +/* Verify no dynamic realignment is performed.  */
> > +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" { xfail *-*-* } } 
> > } */
> >
>
> Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
> xfailed due to stack frame access via frame pointer instead of DARP.
> This patch finds the maximum stack alignment from the stack frame access
> instructions and avoids stack realignment if stack alignment needed is
> less than incoming stack boundary.
>
> I am testing this patch.  OK for trunk if there is no regression?
>
>

 We need to keep the preferred stack alignment as the minimum stack
 alignment. Here is the updated patch.  Tested on x86-64.  OK for
 trunk?

 Thanks.
>>>
>>> Hi Jakub,
>>>
>>> This patch fixes the xfailed testcase in your patch:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2013-12/msg01767.html
>>>
>>> Your original patch:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01058.html
>>>
>>> assumes that any instructions accessing stack require stack
>>> realignment:
>>>
>>> +  FOR_EACH_BB (bb)
>>> +{
>>> +  rtx insn;
>>> +  FOR_BB_INSNS (bb, insn)
>>> +if (NONDEBUG_INSN_P (insn)
>>> + && requires_stack_frame_p (insn, prologue_used,
>>> +   set_up_by_prologue))
>>> +  {
>>> + crtl->stack_realign_needed = stack_realign;
>>> + crtl->stack_realign_finalized = true;
>>> + return;
>>> +  }
>>> + }
>>>
>>> This patch checks 

Re: PING: [PATCH] i386: Avoid stack realignment if possible

2017-07-25 Thread Uros Bizjak
On Tue, Jul 25, 2017 at 3:52 PM, H.J. Lu  wrote:
> On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu  wrote:
>> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
>>> On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
 On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  wrote:
 > Hi!
 >
 > Honza recently changed the i?86 backend, so that it often doesn't
 > do -maccumulate-outgoing-args by default on x86_64.
 > Unfortunately, on some of the here included testcases this regressed
 > quite a bit the generated code.  As AVX vectors are used, the dynamic
 > realignment code needs to assume e.g. that some of them will need to be
 > spilled, and for -mno-accumulate-outgoing-args the code needs to set
 > need_drap early as well.  But in when emitting the prologue/epilogue,
 > if need_drap is set, we don't perform the optimization for leaf functions
 > which have zero size stack frame, thus we end up with uselessly doing
 > dynamic stack realignment, setting up DRAP that nothing uses and later on
 > restore everything back.
 >
 > This patch improves it, if the DRAP register isn't live at the start of
 > entry bb successor and we aren't going to realign the stack, we don't
 > need DRAP at all, and even if we need DRAP register, that can't be the 
 > sole
 > reason for doing stack realignment, the prologue code is able to set up 
 > DRAP
 > even without dynamic stack realignment.
 >
 > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 >
 > 2013-12-20  Jakub Jelinek  
 >
 > PR target/59501
 > * config/i386/i386.c (ix86_save_reg): Don't return true for 
 > drap_reg
 > if !crtl->stack_realign_needed.
 > (ix86_finalize_stack_realign_flags): If drap_reg isn't live on 
 > entry
 > and stack_realign_needed will be false, clear drap_reg and 
 > need_drap.
 > Optimize leaf functions that don't need stack frame even if
 > crtl->need_drap.
 >
 > * gcc.target/i386/pr59501-1.c: New test.
 > * gcc.target/i386/pr59501-1a.c: New test.
 > * gcc.target/i386/pr59501-2.c: New test.
 > * gcc.target/i386/pr59501-2a.c: New test.
 > * gcc.target/i386/pr59501-3.c: New test.
 > * gcc.target/i386/pr59501-3a.c: New test.
 > * gcc.target/i386/pr59501-4.c: New test.
 > * gcc.target/i386/pr59501-4a.c: New test.
 > * gcc.target/i386/pr59501-5.c: New test.
 > * gcc.target/i386/pr59501-6.c: New test.

LGTM, assuming Jakub is OK with the patch.

Thanks,
Uros.


 >
 > --- gcc/testsuite/gcc.target/i386/pr59501-4a.c.jj   2013-12-20 
 > 12:19:20.603212859 +0100
 > +++ gcc/testsuite/gcc.target/i386/pr59501-4a.c  2013-12-20 
 > 12:23:33.647881672 +0100
 > @@ -0,0 +1,8 @@
 > +/* PR target/59501 */
 > +/* { dg-do compile { target { ! ia32 } } } */
 > +/* { dg-options "-O2 -mavx -maccumulate-outgoing-args" } */
 > +
 > +#include "pr59501-3a.c"
 > +
 > +/* Verify no dynamic realignment is performed.  */
 > +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" { xfail *-*-* } } } 
 > */
 >

 Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
 xfailed due to stack frame access via frame pointer instead of DARP.
 This patch finds the maximum stack alignment from the stack frame access
 instructions and avoids stack realignment if stack alignment needed is
 less than incoming stack boundary.

 I am testing this patch.  OK for trunk if there is no regression?


>>>
>>> We need to keep the preferred stack alignment as the minimum stack
>>> alignment. Here is the updated patch.  Tested on x86-64.  OK for
>>> trunk?
>>>
>>> Thanks.
>>
>> Hi Jakub,
>>
>> This patch fixes the xfailed testcase in your patch:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2013-12/msg01767.html
>>
>> Your original patch:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01058.html
>>
>> assumes that any instructions accessing stack require stack
>> realignment:
>>
>> +  FOR_EACH_BB (bb)
>> +{
>> +  rtx insn;
>> +  FOR_BB_INSNS (bb, insn)
>> +if (NONDEBUG_INSN_P (insn)
>> + && requires_stack_frame_p (insn, prologue_used,
>> +   set_up_by_prologue))
>> +  {
>> + crtl->stack_realign_needed = stack_realign;
>> + crtl->stack_realign_finalized = true;
>> + return;
>> +  }
>> + }
>>
>> This patch checks the actual alignment needed for any instructions
>> accessing stack.  It skips stack realignment if the actual stack alignment
>> needed is less than or equal to incoming stack alignment.
>>
>> Can you take look at it?
>>
>> Thanks.
>>
>
> PING
>
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html
>
> --
> 

PING: [PATCH] i386: Avoid stack realignment if possible

2017-07-25 Thread H.J. Lu
On Fri, Jul 14, 2017 at 4:46 AM, H.J. Lu  wrote:
> On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
>> On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
>>> On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  wrote:
>>> > Hi!
>>> >
>>> > Honza recently changed the i?86 backend, so that it often doesn't
>>> > do -maccumulate-outgoing-args by default on x86_64.
>>> > Unfortunately, on some of the here included testcases this regressed
>>> > quite a bit the generated code.  As AVX vectors are used, the dynamic
>>> > realignment code needs to assume e.g. that some of them will need to be
>>> > spilled, and for -mno-accumulate-outgoing-args the code needs to set
>>> > need_drap early as well.  But in when emitting the prologue/epilogue,
>>> > if need_drap is set, we don't perform the optimization for leaf functions
>>> > which have zero size stack frame, thus we end up with uselessly doing
>>> > dynamic stack realignment, setting up DRAP that nothing uses and later on
>>> > restore everything back.
>>> >
>>> > This patch improves it, if the DRAP register isn't live at the start of
>>> > entry bb successor and we aren't going to realign the stack, we don't
>>> > need DRAP at all, and even if we need DRAP register, that can't be the 
>>> > sole
>>> > reason for doing stack realignment, the prologue code is able to set up 
>>> > DRAP
>>> > even without dynamic stack realignment.
>>> >
>>> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>> >
>>> > 2013-12-20  Jakub Jelinek  
>>> >
>>> > PR target/59501
>>> > * config/i386/i386.c (ix86_save_reg): Don't return true for 
>>> > drap_reg
>>> > if !crtl->stack_realign_needed.
>>> > (ix86_finalize_stack_realign_flags): If drap_reg isn't live on 
>>> > entry
>>> > and stack_realign_needed will be false, clear drap_reg and 
>>> > need_drap.
>>> > Optimize leaf functions that don't need stack frame even if
>>> > crtl->need_drap.
>>> >
>>> > * gcc.target/i386/pr59501-1.c: New test.
>>> > * gcc.target/i386/pr59501-1a.c: New test.
>>> > * gcc.target/i386/pr59501-2.c: New test.
>>> > * gcc.target/i386/pr59501-2a.c: New test.
>>> > * gcc.target/i386/pr59501-3.c: New test.
>>> > * gcc.target/i386/pr59501-3a.c: New test.
>>> > * gcc.target/i386/pr59501-4.c: New test.
>>> > * gcc.target/i386/pr59501-4a.c: New test.
>>> > * gcc.target/i386/pr59501-5.c: New test.
>>> > * gcc.target/i386/pr59501-6.c: New test.
>>> >
>>> >
>>> > --- gcc/testsuite/gcc.target/i386/pr59501-4a.c.jj   2013-12-20 
>>> > 12:19:20.603212859 +0100
>>> > +++ gcc/testsuite/gcc.target/i386/pr59501-4a.c  2013-12-20 
>>> > 12:23:33.647881672 +0100
>>> > @@ -0,0 +1,8 @@
>>> > +/* PR target/59501 */
>>> > +/* { dg-do compile { target { ! ia32 } } } */
>>> > +/* { dg-options "-O2 -mavx -maccumulate-outgoing-args" } */
>>> > +
>>> > +#include "pr59501-3a.c"
>>> > +
>>> > +/* Verify no dynamic realignment is performed.  */
>>> > +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" { xfail *-*-* } } } 
>>> > */
>>> >
>>>
>>> Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
>>> xfailed due to stack frame access via frame pointer instead of DARP.
>>> This patch finds the maximum stack alignment from the stack frame access
>>> instructions and avoids stack realignment if stack alignment needed is
>>> less than incoming stack boundary.
>>>
>>> I am testing this patch.  OK for trunk if there is no regression?
>>>
>>>
>>
>> We need to keep the preferred stack alignment as the minimum stack
>> alignment. Here is the updated patch.  Tested on x86-64.  OK for
>> trunk?
>>
>> Thanks.
>
> Hi Jakub,
>
> This patch fixes the xfailed testcase in your patch:
>
> https://gcc.gnu.org/ml/gcc-patches/2013-12/msg01767.html
>
> Your original patch:
>
> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01058.html
>
> assumes that any instructions accessing stack require stack
> realignment:
>
> +  FOR_EACH_BB (bb)
> +{
> +  rtx insn;
> +  FOR_BB_INSNS (bb, insn)
> +if (NONDEBUG_INSN_P (insn)
> + && requires_stack_frame_p (insn, prologue_used,
> +   set_up_by_prologue))
> +  {
> + crtl->stack_realign_needed = stack_realign;
> + crtl->stack_realign_finalized = true;
> + return;
> +  }
> + }
>
> This patch checks the actual alignment needed for any instructions
> accessing stack.  It skips stack realignment if the actual stack alignment
> needed is less than or equal to incoming stack alignment.
>
> Can you take look at it?
>
> Thanks.
>

PING

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00400.html

-- 
H.J.


Re: [PATCH] i386: Avoid stack realignment if possible

2017-07-14 Thread H.J. Lu
On Fri, Jul 7, 2017 at 5:56 PM, H.J. Lu  wrote:
> On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
>> On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  wrote:
>> > Hi!
>> >
>> > Honza recently changed the i?86 backend, so that it often doesn't
>> > do -maccumulate-outgoing-args by default on x86_64.
>> > Unfortunately, on some of the here included testcases this regressed
>> > quite a bit the generated code.  As AVX vectors are used, the dynamic
>> > realignment code needs to assume e.g. that some of them will need to be
>> > spilled, and for -mno-accumulate-outgoing-args the code needs to set
>> > need_drap early as well.  But in when emitting the prologue/epilogue,
>> > if need_drap is set, we don't perform the optimization for leaf functions
>> > which have zero size stack frame, thus we end up with uselessly doing
>> > dynamic stack realignment, setting up DRAP that nothing uses and later on
>> > restore everything back.
>> >
>> > This patch improves it, if the DRAP register isn't live at the start of
>> > entry bb successor and we aren't going to realign the stack, we don't
>> > need DRAP at all, and even if we need DRAP register, that can't be the sole
>> > reason for doing stack realignment, the prologue code is able to set up 
>> > DRAP
>> > even without dynamic stack realignment.
>> >
>> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> >
>> > 2013-12-20  Jakub Jelinek  
>> >
>> > PR target/59501
>> > * config/i386/i386.c (ix86_save_reg): Don't return true for 
>> > drap_reg
>> > if !crtl->stack_realign_needed.
>> > (ix86_finalize_stack_realign_flags): If drap_reg isn't live on 
>> > entry
>> > and stack_realign_needed will be false, clear drap_reg and 
>> > need_drap.
>> > Optimize leaf functions that don't need stack frame even if
>> > crtl->need_drap.
>> >
>> > * gcc.target/i386/pr59501-1.c: New test.
>> > * gcc.target/i386/pr59501-1a.c: New test.
>> > * gcc.target/i386/pr59501-2.c: New test.
>> > * gcc.target/i386/pr59501-2a.c: New test.
>> > * gcc.target/i386/pr59501-3.c: New test.
>> > * gcc.target/i386/pr59501-3a.c: New test.
>> > * gcc.target/i386/pr59501-4.c: New test.
>> > * gcc.target/i386/pr59501-4a.c: New test.
>> > * gcc.target/i386/pr59501-5.c: New test.
>> > * gcc.target/i386/pr59501-6.c: New test.
>> >
>> >
>> > --- gcc/testsuite/gcc.target/i386/pr59501-4a.c.jj   2013-12-20 
>> > 12:19:20.603212859 +0100
>> > +++ gcc/testsuite/gcc.target/i386/pr59501-4a.c  2013-12-20 
>> > 12:23:33.647881672 +0100
>> > @@ -0,0 +1,8 @@
>> > +/* PR target/59501 */
>> > +/* { dg-do compile { target { ! ia32 } } } */
>> > +/* { dg-options "-O2 -mavx -maccumulate-outgoing-args" } */
>> > +
>> > +#include "pr59501-3a.c"
>> > +
>> > +/* Verify no dynamic realignment is performed.  */
>> > +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" { xfail *-*-* } } } */
>> >
>>
>> Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
>> xfailed due to stack frame access via frame pointer instead of DARP.
>> This patch finds the maximum stack alignment from the stack frame access
>> instructions and avoids stack realignment if stack alignment needed is
>> less than incoming stack boundary.
>>
>> I am testing this patch.  OK for trunk if there is no regression?
>>
>>
>
> We need to keep the preferred stack alignment as the minimum stack
> alignment. Here is the updated patch.  Tested on x86-64.  OK for
> trunk?
>
> Thanks.

Hi Jakub,

This patch fixes the xfailed testcase in your patch:

https://gcc.gnu.org/ml/gcc-patches/2013-12/msg01767.html

Your original patch:

https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01058.html

assumes that any instructions accessing stack require stack
realignment:

+  FOR_EACH_BB (bb)
+{
+  rtx insn;
+  FOR_BB_INSNS (bb, insn)
+if (NONDEBUG_INSN_P (insn)
+ && requires_stack_frame_p (insn, prologue_used,
+   set_up_by_prologue))
+  {
+ crtl->stack_realign_needed = stack_realign;
+ crtl->stack_realign_finalized = true;
+ return;
+  }
+ }

This patch checks the actual alignment needed for any instructions
accessing stack.  It skips stack realignment if the actual stack alignment
needed is less than or equal to incoming stack alignment.

Can you take look at it?

Thanks.


> H.J.
> ---
> Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
> xfailed due to stack frame access via frame pointer instead of DARP.
> This patch finds the maximum stack alignment from the stack frame access
> instructions and avoids stack realignment if stack alignment needed is
> less than incoming stack boundary.
>
> gcc/
>
> PR target/59501
> * config/i386/i386.c (ix86_finalize_stack_realign_flags): Don't
> realign stack if stack alignment needed is less than incoming
> stack boundary.

[PATCH] i386: Avoid stack realignment if possible

2017-07-07 Thread H.J. Lu
On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
> On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek  wrote:
> > Hi!
> >
> > Honza recently changed the i?86 backend, so that it often doesn't
> > do -maccumulate-outgoing-args by default on x86_64.
> > Unfortunately, on some of the here included testcases this regressed
> > quite a bit the generated code.  As AVX vectors are used, the dynamic
> > realignment code needs to assume e.g. that some of them will need to be
> > spilled, and for -mno-accumulate-outgoing-args the code needs to set
> > need_drap early as well.  But in when emitting the prologue/epilogue,
> > if need_drap is set, we don't perform the optimization for leaf functions
> > which have zero size stack frame, thus we end up with uselessly doing
> > dynamic stack realignment, setting up DRAP that nothing uses and later on
> > restore everything back.
> >
> > This patch improves it, if the DRAP register isn't live at the start of
> > entry bb successor and we aren't going to realign the stack, we don't
> > need DRAP at all, and even if we need DRAP register, that can't be the sole
> > reason for doing stack realignment, the prologue code is able to set up DRAP
> > even without dynamic stack realignment.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2013-12-20  Jakub Jelinek  
> >
> > PR target/59501
> > * config/i386/i386.c (ix86_save_reg): Don't return true for drap_reg
> > if !crtl->stack_realign_needed.
> > (ix86_finalize_stack_realign_flags): If drap_reg isn't live on entry
> > and stack_realign_needed will be false, clear drap_reg and 
> > need_drap.
> > Optimize leaf functions that don't need stack frame even if
> > crtl->need_drap.
> >
> > * gcc.target/i386/pr59501-1.c: New test.
> > * gcc.target/i386/pr59501-1a.c: New test.
> > * gcc.target/i386/pr59501-2.c: New test.
> > * gcc.target/i386/pr59501-2a.c: New test.
> > * gcc.target/i386/pr59501-3.c: New test.
> > * gcc.target/i386/pr59501-3a.c: New test.
> > * gcc.target/i386/pr59501-4.c: New test.
> > * gcc.target/i386/pr59501-4a.c: New test.
> > * gcc.target/i386/pr59501-5.c: New test.
> > * gcc.target/i386/pr59501-6.c: New test.
> >
> >
> > --- gcc/testsuite/gcc.target/i386/pr59501-4a.c.jj   2013-12-20 
> > 12:19:20.603212859 +0100
> > +++ gcc/testsuite/gcc.target/i386/pr59501-4a.c  2013-12-20 
> > 12:23:33.647881672 +0100
> > @@ -0,0 +1,8 @@
> > +/* PR target/59501 */
> > +/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-options "-O2 -mavx -maccumulate-outgoing-args" } */
> > +
> > +#include "pr59501-3a.c"
> > +
> > +/* Verify no dynamic realignment is performed.  */
> > +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" { xfail *-*-* } } } */
> >
> 
> Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
> xfailed due to stack frame access via frame pointer instead of DARP.
> This patch finds the maximum stack alignment from the stack frame access
> instructions and avoids stack realignment if stack alignment needed is
> less than incoming stack boundary.
> 
> I am testing this patch.  OK for trunk if there is no regression?
> 
> 

We need to keep the preferred stack alignment as the minimum stack
alignment. Here is the updated patch.  Tested on x86-64.  OK for
trunk?

Thanks.

H.J.
---
Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
xfailed due to stack frame access via frame pointer instead of DARP.
This patch finds the maximum stack alignment from the stack frame access
instructions and avoids stack realignment if stack alignment needed is
less than incoming stack boundary.

gcc/

PR target/59501
* config/i386/i386.c (ix86_finalize_stack_realign_flags): Don't
realign stack if stack alignment needed is less than incoming
stack boundary.

gcc/testsuite/

PR target/59501
* gcc.target/i386/pr59501-4a.c: Remove xfail.
---
 gcc/config/i386/i386.c | 84 +++---
 gcc/testsuite/gcc.target/i386/pr59501-4a.c |  2 +-
 2 files changed, 56 insertions(+), 30 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b041524..28febd0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14161,6 +14161,11 @@ ix86_finalize_stack_realign_flags (void)
   add_to_hard_reg_set (_up_by_prologue, Pmode, ARG_POINTER_REGNUM);
   add_to_hard_reg_set (_up_by_prologue, Pmode,
   HARD_FRAME_POINTER_REGNUM);
+
+  /* The preferred stack alignment is the minimum stack alignment.  */
+  unsigned int stack_alignment = crtl->preferred_stack_boundary;
+  bool require_stack_frame = false;
+
   FOR_EACH_BB_FN (bb, cfun)
 {
   rtx_insn *insn;
@@ -14169,43 +14174,64 @@ ix86_finalize_stack_realign_flags (void)