https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81825

            Bug ID: 81825
           Summary: x86_64 stack realignment code is suboptimal
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: luto at kernel dot org
  Target Milestone: ---

I compiled this:

void func()
{
        int var __attribute__((aligned(32)));
        asm volatile ("" :: "m" (var));
}

using gcc (GCC) 7.1.1 20170622 (Red Hat 7.1.1-3).  I got (after stripping CFI
stuff):

func:
        leaq    8(%rsp), %r10
        andq    $-32, %rsp
        pushq   -8(%r10)
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %r10
        popq    %r10
        popq    %rbp
        leaq    -8(%r10), %rsp
        ret

I have three objections to this code.

1. The push and immediate pop of %r10 seems pointless.  Maybe it's due to some
weird DWARF limitation?  A register allocation limitation sounds more likely,
though.

2. The addressing modes used for r10 are suboptimal.  Shouldn't the first
instruction be just movq %rsp, %r10?  By my count, this would save 12 bytes of
text.

3. Couldn't the whole thing just be:

pushq %rbp
movq %rsp, %rbp
andq $-32, %rsp

function body here.  rbp can't be used to locate stack variables, but rsp can.

leaveq
ret

Reply via email to