https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89618

            Bug ID: 89618
           Summary: Inner loop won't vectorize unless dummy statement is
                    included
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: moritz.kreutzer at siemens dot com
  Target Milestone: ---

We have a loop in which we are scattering data to an array of length "n" where
can assure no write conflicts only within confined ranges of length "m". Our
implementation includes splitting this loop into an outer and an inner loop and
specifying "#pragma GCC ivdep" for the inner loop:


https://godbolt.org/z/ulnRrk
=======================================================
const int m = 32;

for (int j = 0; j < n/m; ++j)
{
  int const start = j*m;
  int const end = (j+1)*m;

  #pragma GCC ivdep
  for (int i = start; i < end; ++i)
  {
    a[off[i]] = a[i] < 0 ? a[i] : 0;
  }

#ifdef VECTORIZE
  // dummy statement required for vectorization
  if (a[0] == 0.) a[0] = 0.; 
#endif
}
=======================================================

The issue is that GCC (trunk and any earlier version) won't vectorize the code
unless we add the obviously useless dummy statement (guarded by "#ifdef
VECTORIZE"). This is counterintuitive, involves some overhead which we want to
avoid, and may be cumbersome or even impossible to implemented depending on the
specific structure of the inner loop (the body may be passed as a lambda,
etc.).

Without knowing about the internals of GCC, I can imagine that in the absence
of the dummy statment, GCC jams the loops and tries and fails to vectorize the
remaining (outer) loop because it doesn't have an "ivdep" pragma. Can we avoid
this behavior? If my thinking is correct, something like ICC's "#pragma
nounroll_and_jam" could work, but GCC doesn't (officially?) support anything
like it as far as I can see. 

Thanks,
Moritz

Reply via email to