https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80718

            Bug ID: 80718
           Summary: GCC generates slow code for offsettable vec_duplicate
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

In looking at bug 80697, I noticed on power8, there were loads to a GPR
register and move directs to vector registers.

I tracked this down to the load with splat instruction only taking indirect or
indexed loads, while the original address is an offsettable load.  So the
register allocator decides to load up a GPR and do the transfer over to the
vector register to do the vec_duplicate operation.

I.e.:
vector double foo (double *p) { return (vector double) { p[4], p[4] }; }

generates:
foo:
        ld 9,32(3)
        mtvsrd 34,9
        xxpermdi 34,34,34,0
        blr

I tested adding a combiner pattern to support offsettable loads, and it
generates:
foo:
        li 9,32
        lxvdsx 34,3,9
        blr

Reply via email to