http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55295



             Bug #: 55295

           Summary: [SH] Add support for fipr instruction

    Classification: Unclassified

           Product: gcc

           Version: 4.8.0

            Status: UNCONFIRMED

          Severity: enhancement

          Priority: P3

         Component: target

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: olege...@gcc.gnu.org

            Target: sh4*-*-*





Created attachment 28671

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28671

Example combine patterns



On SH4* targets there is a currently unused instruction 'fipr' which can be

used to calculate the dot product of two V4SF vectors:



fipr  FVm, FVn

FR(n+3) = FR(m+0)*FR(n+0) + FR(m+1)*FR(n+1) + FR(m+2)*FR(n+2) + FR(m+3)*FR(n+3)



Some (C++) code that could utilize this:



typedef float v4sf __attribute__ ((vector_size (16)));



float test00 (const v4sf& a, const v4sf& b)

{

  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];

}



float test01 (const v4sf& a, const v4sf& b, const v4sf& c)

{

  float x = a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];

  float y = c[0] * b[0] + c[1] * b[1] + c[2] * b[2] + c[3] * b[3];

  return x + y;

}



float test02 (float a0, float a1, float a2, float a3,

         float b0, float b1, float b2, float b3)

{

  return a0 * b0 + a1 * b1 + a2 * b2 + a3 * b3;

}



float test03 (const float* a, const float* b)

{

  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];

}



Dot products of vectors with 3 elements could also be handled by the fipr insn

by setting the irrelevant element to 0.0 in one of the vector operands.  For 2

element vectors an fmul,fmac sequence seems to be adequate (which already

works).



I've tried adding some combine patterns to handle the V2SF case (see

attachment), but the results are not so convincing.  For example, the case



float test02 (float a0, float a1, float a2, float a3,

         float b0, float b1, float b2, float b3)

{

  return a0 * b0 + a1 * b1 + a2 * b2 + a3 * b3;

}



compiled with -O2 -m4-single -mb results in:



        fmov.s  fr12,@-r15      ! 42    movsf_ie/7    [length = 2]

        fmov.s  fr13,@-r15      ! 43    movsf_ie/7    [length = 2]

        fmov.s  fr14,@-r15      ! 44    movsf_ie/7    [length = 2]

        fmov.s  fr15,@-r15      ! 45    movsf_ie/7    [length = 2]

        fmov    fr9,fr12        ! 31    movsf_ie/1    [length = 2]

        fmov    fr8,fr13        ! 32    movsf_ie/1    [length = 2]

        fmov    fr11,fr14       ! 33    movsf_ie/1    [length = 2]

        fmov    fr10,fr15       ! 34    movsf_ie/1    [length = 2]

        fmov    fr5,fr0         ! 27    movsf_ie/1    [length = 2]

        fmov    fr4,fr1         ! 28    movsf_ie/1    [length = 2]

        fmov    fr7,fr2         ! 29    movsf_ie/1    [length = 2]

        fmov    fr6,fr3         ! 30    movsf_ie/1    [length = 2]

        fipr    fv12,fv0        ! 35    fipr_compact    [length = 2]

        fmov.s  @r15+,fr15      ! 50    movsf_ie/6    [length = 2]

        fmov.s  @r15+,fr14      ! 51    movsf_ie/6    [length = 2]

        fmov    fr3,fr0         ! 36    movsf_ie/1    [length = 2]

        fmov.s  @r15+,fr13      ! 52    movsf_ie/6    [length = 2]

        rts                     ! 54    *return_i    [length = 2]

        fmov.s  @r15+,fr12      ! 53    movsf_ie/6    [length = 2]



which actually is supposed to be:



        fipr    fv4,fv8

        rts

        fmov    fr11,fr0







Also, in the case of



float test01 (const v4sf& a, const v4sf& b, const v4sf& c)

{

  float x = a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];

  float y = c[0] * b[0] + c[1] * b[1] + c[2] * b[2] + c[3] * b[3];

  return x + y;

}



only one fipr insn is generated, due to various other optimization effects.



It seems there is no standard name pattern for doing FP vector dot products

yet.  

I guess it would be better to also have some tree-optimization support for

this.

Reply via email to