https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98339

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
             Target|                            |x86_64-*-*
     Ever confirmed|0                           |1
             Blocks|                            |53947
   Last reconfirmed|                            |2021-01-04

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that we need to vectorize this as reduction and since there's no
"masked scalar store" on GIMPLE LIM itself doesn't help.  The issue why
LIM doesn't apply store-motion here is the _load_ which can trap.  LIM would
like to do

  ret0 = ret[0];
  bool stored = false;
    for (int i = 0; i < n; i++)
    {
        int pos = start + i;
        if ( pos <= m)
          {
            ret0 += x[i];    
            stored = true;
          }
    }
  if (stored)
    ret[0] = ret0;

but as you can see the unconditional load breaks this.  LIM would need to
be changed to handle the whole load-update-store sequence delaying the
load as well (thereby re-associating the reduction).

An alternative would be to split the loop and apply store-motion to the tail.

    for (int i = 0; i < n; i++)
    {
        int pos = start + i;
        if ( pos <= m)
          break;
    }
    if (i < n)
      {
        ret0 = ret[0];
      for (int i = 0; i < n; i++)
       {
         int pos = start + i;
         if ( pos <= m)
            ret0 += x[i]; 
       }
        ret[0] = ret0;
      }

we can then vectorize the second loop.

At the source level the fix is to make sure the load from ret[0] doesn't trap.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

Reply via email to