https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91722
Bug ID: 91722 Summary: gcc generates sub-optimal assembly when AVX instructions are used. Product: gcc Version: 9.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: maxim.yegorushkin at gmail dot com Target Milestone: --- The following code: #include <immintrin.h> __m256 copysign_ps(__m256 from, __m256 to) { constexpr float signbit = -0.f; auto const avx_sigbit = _mm256_broadcast_ss(&signbit); return _mm256_or_ps(_mm256_and_ps(avx_sigbit, from), _mm256_andnot_ps(avx_sigbit, to)); } When compiled with `g++-9.2 -O2 -mavx -std=c++11` produces the following assembly: copysign_ps(float __vector(8), float __vector(8)): push rbp vmovaps ymm2, ymm0 mov rbp, rsp and rsp, -32 vbroadcastss ymm0, DWORD PTR .LC0[rip] vandnps ymm1, ymm0, ymm1 vandps ymm0, ymm0, ymm2 vorps ymm0, ymm0, ymm1 leave ret .LC0: .long 2147483648 The 4 instructions involving rbp, rsp and leave do not seem to be necessary at all. When compiled with `clang++-8.0 -O2 -mavx -std=c++11` it produces assembly with only expected instructions: .LCPI0_0: .long 2147483648 # 0x80000000 .long 2147483648 # 0x80000000 .long 2147483648 # 0x80000000 .long 2147483648 # 0x80000000 .long 2147483648 # 0x80000000 .long 2147483648 # 0x80000000 .long 2147483648 # 0x80000000 .long 2147483648 # 0x80000000 .LCPI0_1: .long 2147483647 # 0x7fffffff .long 2147483647 # 0x7fffffff .long 2147483647 # 0x7fffffff .long 2147483647 # 0x7fffffff .long 2147483647 # 0x7fffffff .long 2147483647 # 0x7fffffff .long 2147483647 # 0x7fffffff .long 2147483647 # 0x7fffffff copysign_ps(float __vector(8), float __vector(8)): # @copysign_ps(float __vector(8), float __vector(8)) vandps ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] vandps ymm1, ymm1, ymmword ptr [rip + .LCPI0_1] vorps ymm0, ymm1, ymm0 ret