https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89680
Bug ID: 89680 Summary: Redundant moves with -march=skylake for long long shift on 32bit x86 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ubizjak at gmail dot com Target Milestone: --- Following testcase: --cut here-- unsigned long long foo (unsigned long long i) { return i << 3; } --cut here-- compiles with -O2 -march=skylake -m32 to: subl $28, %esp movl 32(%esp), %eax movl 36(%esp), %edx movl %eax, (%esp) movl %edx, 4(%esp) vmovdqa (%esp), %xmm1 addl $28, %esp vpsllq $3, %xmm1, %xmm0 vmovd %xmm0, %eax vpextrd $1, %xmm0, %edx ret but with -O2 -march=haswell -m32 to: vmovq 4(%esp), %xmm0 vpsllq $3, %xmm0, %xmm0 vmovd %xmm0, %eax vpextrd $1, %xmm0, %edx ret The difference starts in IRA pass with: Pass 0 for finding pseudo/allocno costs a0 (r88,l0) best DREG, allocno DREG a1 (r87,l0) best AREG, allocno AREG a2 (r85,l0) best NO_REX_SSE_REGS, allocno NO_REX_SSE_REGS - a3 (r83,l0) best NO_REX_SSE_REGS, allocno NO_REX_SSE_REGS + a3 (r83,l0) best NO_REGS, allocno NO_REGS Pass 1 for finding pseudo/allocno costs r88: preferred DREG, alternative GENERAL_REGS, allocno GENERAL_REGS r87: preferred AREG, alternative GENERAL_REGS, allocno GENERAL_REGS r85: preferred NO_REX_SSE_REGS, alternative NO_REGS, allocno NO_REX_SSE_REGS - r83: preferred NO_REX_SSE_REGS, alternative NO_REGS, allocno NO_REX_SSE_REGS + r83: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS and going downhill from there.