https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80286

            Bug ID: 80286
           Summary: [4.9/5/6/7 regressions] AVX2 _mm_cvtsi128_si32 doesn't
                    return a proper 32bits int
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gregory.hainaut at gmail dot com
  Target Milestone: ---

Dear GCC developers,

It seems that G++4.9 introduced an optimization to reduce stack/memory access
that broke _mm_cvtsi128_si32 behavior. 

Note: I tested the various GCC version with godbolt.org, I don't know if GCC 7
snapshot is recent or not.
Note2: maybe the issue belong to RTL/tree optimization but I have no clue.

Here a small test case
---------------------8<-----------------------------------
#include <immintrin.h>

__m256i m;

__m128i extract(__m128i minmax)
{
    int shift = _mm_cvtsi128_si32(_mm256_castsi256_si128(m));
    return _mm_srli_epi16(minmax, shift);
}
--------------------->8-----------------------------------
It will be compiled as 2 following asm intruction (on recent GCC). The issue is
that shift operand is 64 bits. So "shift" must be zero extended to 64 bits.
Typically Clang uses vpmovzxdq

 vmovdqa m(%rip), %ymm1
 vpsrlw  %xmm1, %xmm0, %xmm0

Best Regards,
Gregory

Reply via email to