https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325

            Bug ID: 112325
           Summary: Missed vectorization after cunrolli
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wwwhhhyyy333 at gmail dot com
  Target Milestone: ---

testcase:

#include <stdint.h>
#include <string.h>

typedef struct {
    float s;
    int8_t qs[32];
} block;

void foo (const int n, float * restrict s, const int8_t q[4], const block *
restrict y) {
    const int qk = 32;
    const int nb = n / qk;

    float sumf = 0.0;
    int sumi = 0;

    for (int i = 0; i < nb; i++) {
        uint32_t qh;
        memcpy(&qh, q, 4);

        for (int j = 0; j < qk/2; ++j) {
            sumi += (qh >> j) * y[i].qs[j];
        }
        sumf += (y[i].s * (float) sumi);
    }
    *s = sumf;
}

This can be vectorized under -O2 -mavx512vl but not -O3 -mavx512vl, see
https://godbolt.org/z/csPr4cPen

Under -O3 -mavx512vl -fdisable-tree-cunrolli the loop can also be vectorized.

Reply via email to