http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295
Bug #: 55295 Summary: [SH] Add support for fipr instruction Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: olege...@gcc.gnu.org Target: sh4*-*-* Created attachment 28671 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28671 Example combine patterns On SH4* targets there is a currently unused instruction 'fipr' which can be used to calculate the dot product of two V4SF vectors: fipr FVm, FVn FR(n+3) = FR(m+0)*FR(n+0) + FR(m+1)*FR(n+1) + FR(m+2)*FR(n+2) + FR(m+3)*FR(n+3) Some (C++) code that could utilize this: typedef float v4sf __attribute__ ((vector_size (16))); float test00 (const v4sf& a, const v4sf& b) { return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]; } float test01 (const v4sf& a, const v4sf& b, const v4sf& c) { float x = a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]; float y = c[0] * b[0] + c[1] * b[1] + c[2] * b[2] + c[3] * b[3]; return x + y; } float test02 (float a0, float a1, float a2, float a3, float b0, float b1, float b2, float b3) { return a0 * b0 + a1 * b1 + a2 * b2 + a3 * b3; } float test03 (const float* a, const float* b) { return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]; } Dot products of vectors with 3 elements could also be handled by the fipr insn by setting the irrelevant element to 0.0 in one of the vector operands. For 2 element vectors an fmul,fmac sequence seems to be adequate (which already works). I've tried adding some combine patterns to handle the V2SF case (see attachment), but the results are not so convincing. For example, the case float test02 (float a0, float a1, float a2, float a3, float b0, float b1, float b2, float b3) { return a0 * b0 + a1 * b1 + a2 * b2 + a3 * b3; } compiled with -O2 -m4-single -mb results in: fmov.s fr12,@-r15 ! 42 movsf_ie/7 [length = 2] fmov.s fr13,@-r15 ! 43 movsf_ie/7 [length = 2] fmov.s fr14,@-r15 ! 44 movsf_ie/7 [length = 2] fmov.s fr15,@-r15 ! 45 movsf_ie/7 [length = 2] fmov fr9,fr12 ! 31 movsf_ie/1 [length = 2] fmov fr8,fr13 ! 32 movsf_ie/1 [length = 2] fmov fr11,fr14 ! 33 movsf_ie/1 [length = 2] fmov fr10,fr15 ! 34 movsf_ie/1 [length = 2] fmov fr5,fr0 ! 27 movsf_ie/1 [length = 2] fmov fr4,fr1 ! 28 movsf_ie/1 [length = 2] fmov fr7,fr2 ! 29 movsf_ie/1 [length = 2] fmov fr6,fr3 ! 30 movsf_ie/1 [length = 2] fipr fv12,fv0 ! 35 fipr_compact [length = 2] fmov.s @r15+,fr15 ! 50 movsf_ie/6 [length = 2] fmov.s @r15+,fr14 ! 51 movsf_ie/6 [length = 2] fmov fr3,fr0 ! 36 movsf_ie/1 [length = 2] fmov.s @r15+,fr13 ! 52 movsf_ie/6 [length = 2] rts ! 54 *return_i [length = 2] fmov.s @r15+,fr12 ! 53 movsf_ie/6 [length = 2] which actually is supposed to be: fipr fv4,fv8 rts fmov fr11,fr0 Also, in the case of float test01 (const v4sf& a, const v4sf& b, const v4sf& c) { float x = a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]; float y = c[0] * b[0] + c[1] * b[1] + c[2] * b[2] + c[3] * b[3]; return x + y; } only one fipr insn is generated, due to various other optimization effects. It seems there is no standard name pattern for doing FP vector dot products yet. I guess it would be better to also have some tree-optimization support for this.