https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80718
Bug ID: 80718 Summary: GCC generates slow code for offsettable vec_duplicate Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: meissner at gcc dot gnu.org Target Milestone: --- In looking at bug 80697, I noticed on power8, there were loads to a GPR register and move directs to vector registers. I tracked this down to the load with splat instruction only taking indirect or indexed loads, while the original address is an offsettable load. So the register allocator decides to load up a GPR and do the transfer over to the vector register to do the vec_duplicate operation. I.e.: vector double foo (double *p) { return (vector double) { p[4], p[4] }; } generates: foo: ld 9,32(3) mtvsrd 34,9 xxpermdi 34,34,34,0 blr I tested adding a combiner pattern to support offsettable loads, and it generates: foo: li 9,32 lxvdsx 34,3,9 blr