https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683
Bug ID: 111683 Summary: Incorrect answer when using SSE2 intrinsics with -O3 Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: deodharvinit99 at gmail dot com Target Milestone: --- Created attachment 56042 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56042&action=edit convn_script g++ produces incorrect answers when SSE2 intrinsics are used with -O3. -O2 produces same answers compared to an equivalent code written without SSE2 A standalone repro shell script: g++ -O2 convn_script.cpp ./a.out g++ -O3 convn_script.cpp ./a.out Target: x86_64-linux-gnu gcc version 10.2.1 20210110 (Debian 10.2.1-6) covn_script.cpp: #include <cstring> #include <emmintrin.h> #include <iostream> // Function Definitions // // Check properties of input in9 // // Arguments : const double in9[10] // const double in10[7] // double out5[16] // Return Type : void // void convn_script(const double in9[10], const double in10[7], double out5[16]) { int iB; int iC; // Check properties of input in10 std::memset(&out5[0], 0, 16U * sizeof(double)); iC = 0; iB = 0; for (int i{0}; i < 7; i++) { int b_i; int vectorUB; if (i + 10 <= 15) { b_i = 9; } else { b_i = 15 - i; } vectorUB = (((b_i + 1) / 2) << 1) - 2; for (int r{0}; r <= vectorUB; r += 2) { __m128d b_r; b_i = iC + r; b_r = _mm_loadu_pd(&out5[b_i]); _mm_storeu_pd(&out5[b_i], _mm_add_pd(b_r, _mm_mul_pd(_mm_set1_pd(in10[iB]), _mm_loadu_pd(&in9[r])))); } iC = iB + 1; iB++; } } int main() { double in9[10] = {0.8147, 0.9058, 0.1270, 0.9134, 0.6324, 0.0975, 0.2785, 0.5469, 0.9575, 0.9649}; double in10[7] = { 0.1576, 0.9706, 0.9572, 0.4854, 0.8003, 0.1419, 0.4218}; double out5[16]; convn_script(in9, in10, out5); for(int i = 0; i < 16; i++) { std::cout << "Out[" << i << "] = " << out5[i] << "\n"; } }