https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167

            Bug ID: 98167
           Summary: [x86] Failure to optimize operation on indentically
                    shuffled operand into a shuffle of the result of the
                    operation
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

__m128 f(__m128 a, __m128 b) {
    return _mm_mul_ps(_mm_shuffle_ps(a, a, 0), _mm_shuffle_ps(b, b, 0));
}

This can be optimized to:

__m128 f(__m128 a, __m128 b) {
    __m128 tmp = _mm_mul_ss(a, b);
    return _mm_shuffle_ps(tmp, tmp, 0);
}

This transformation is done by LLVM, but not by GCC.

Reply via email to