https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101390

            Bug ID: 101390
           Summary: Expand vector mod as vector div + multiply-subtract
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

When the target supports an sdiv/udiv pattern for vector modes we could
synthesise a vector modulus operation using the division and a
multiply-subtract operation.
#define N 128

extern signed int si_a[N], si_b[N], si_c[N];

void
test_si ()
{
  for (int i = 0; i < N; i++)
    si_c[i] = si_a[i] % si_b[i];
}

On AArch64 SVE (but not Neon) has vector SDIV/UDIV instructions and so could
generate:
.L2:
        ld1w    z2.s, p0/z, [x4, x0, lsl 2]
        ld1w    z1.s, p0/z, [x3, x0, lsl 2]
        movprfx z0, z2
        sdiv    z0.s, p1/m, z0.s, z1.s
        msb     z0.s, p1/m, z1.s, z2.s
        st1w    z0.s, p0, [x1, x0, lsl 2]
        incw    x0
        whilelo p0.s, w0, w2
        b.any   .L2

This can be achieved by implementing the smod and mod optabs in the aarch64
backend for SVE, but this is a generic transformation, so could be handled more
generally in vect_recog_divmod_pattern and/or the vector lowering code so that
more targets can benefit.

Reply via email to