"Devulapalli, Raghuveer" <raghuveer.devulapa...@intel.com> writes:
> Great catch! From the intrinsic manual: > > Cast vector of type __m128i to type __m512i; the upper 384 bits of the > result are undefined. Just be curious, what kind of optimization (like what -O2 does) could mask this issue? > Replacing that with _mm512_zextsi128_si512 fixes the problem. congratulations! -- Best Regards Andy Fan