https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
Bug ID: 98167 Summary: [x86] Failure to optimize operation on indentically shuffled operand into a shuffle of the result of the operation Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- __m128 f(__m128 a, __m128 b) { return _mm_mul_ps(_mm_shuffle_ps(a, a, 0), _mm_shuffle_ps(b, b, 0)); } This can be optimized to: __m128 f(__m128 a, __m128 b) { __m128 tmp = _mm_mul_ss(a, b); return _mm_shuffle_ps(tmp, tmp, 0); } This transformation is done by LLVM, but not by GCC.