https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325
Bug ID: 112325 Summary: Missed vectorization after cunrolli Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wwwhhhyyy333 at gmail dot com Target Milestone: --- testcase: #include <stdint.h> #include <string.h> typedef struct { float s; int8_t qs[32]; } block; void foo (const int n, float * restrict s, const int8_t q[4], const block * restrict y) { const int qk = 32; const int nb = n / qk; float sumf = 0.0; int sumi = 0; for (int i = 0; i < nb; i++) { uint32_t qh; memcpy(&qh, q, 4); for (int j = 0; j < qk/2; ++j) { sumi += (qh >> j) * y[i].qs[j]; } sumf += (y[i].s * (float) sumi); } *s = sumf; } This can be vectorized under -O2 -mavx512vl but not -O3 -mavx512vl, see https://godbolt.org/z/csPr4cPen Under -O3 -mavx512vl -fdisable-tree-cunrolli the loop can also be vectorized.