https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81825
Bug ID: 81825 Summary: x86_64 stack realignment code is suboptimal Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: luto at kernel dot org Target Milestone: --- I compiled this: void func() { int var __attribute__((aligned(32))); asm volatile ("" :: "m" (var)); } using gcc (GCC) 7.1.1 20170622 (Red Hat 7.1.1-3). I got (after stripping CFI stuff): func: leaq 8(%rsp), %r10 andq $-32, %rsp pushq -8(%r10) pushq %rbp movq %rsp, %rbp pushq %r10 popq %r10 popq %rbp leaq -8(%r10), %rsp ret I have three objections to this code. 1. The push and immediate pop of %r10 seems pointless. Maybe it's due to some weird DWARF limitation? A register allocation limitation sounds more likely, though. 2. The addressing modes used for r10 are suboptimal. Shouldn't the first instruction be just movq %rsp, %r10? By my count, this would save 12 bytes of text. 3. Couldn't the whole thing just be: pushq %rbp movq %rsp, %rbp andq $-32, %rsp function body here. rbp can't be used to locate stack variables, but rsp can. leaveq ret