https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89618
Bug ID: 89618 Summary: Inner loop won't vectorize unless dummy statement is included Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: moritz.kreutzer at siemens dot com Target Milestone: --- We have a loop in which we are scattering data to an array of length "n" where can assure no write conflicts only within confined ranges of length "m". Our implementation includes splitting this loop into an outer and an inner loop and specifying "#pragma GCC ivdep" for the inner loop: https://godbolt.org/z/ulnRrk ======================================================= const int m = 32; for (int j = 0; j < n/m; ++j) { int const start = j*m; int const end = (j+1)*m; #pragma GCC ivdep for (int i = start; i < end; ++i) { a[off[i]] = a[i] < 0 ? a[i] : 0; } #ifdef VECTORIZE // dummy statement required for vectorization if (a[0] == 0.) a[0] = 0.; #endif } ======================================================= The issue is that GCC (trunk and any earlier version) won't vectorize the code unless we add the obviously useless dummy statement (guarded by "#ifdef VECTORIZE"). This is counterintuitive, involves some overhead which we want to avoid, and may be cumbersome or even impossible to implemented depending on the specific structure of the inner loop (the body may be passed as a lambda, etc.). Without knowing about the internals of GCC, I can imagine that in the absence of the dummy statment, GCC jams the loops and tries and fails to vectorize the remaining (outer) loop because it doesn't have an "ivdep" pragma. Can we avoid this behavior? If my thinking is correct, something like ICC's "#pragma nounroll_and_jam" could work, but GCC doesn't (officially?) support anything like it as far as I can see. Thanks, Moritz