http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56717



             Bug #: 56717

           Summary: Enhance Dot-product pattern recognition to avoid mult

                    widening.

    Classification: Unclassified

           Product: gcc

           Version: 4.9.0

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: tree-optimization

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: ysrum...@gmail.com





Comparing performance of icc and gcc compilers we found out that for one

important benchmark from eembc 1.1 suite gcc produces very poor code in

comparison with icc. This deficiency can be illustrated by the following simple

example:



typedef signed short s16;

typedef signed long  s32;

void bar (s16 *in1, s16 *in2, s16 *out, int n, s16 scale)

{

  int i;

  s32 acc = 0;

  for (i=0; i<n; i++)

    acc += ((s32) in1[i] * (s32) in2[i]) >> scale;

  *out = (s16) acc;

}

gcc performes mult widening conversion for it which does not look reasonable

and leads to suboptiml code for x86 at least.



I assume that Dot-prodeuct pattern recognition can be simply enhanced to accept

such case by allowing the following stmts:



     type x_t, y_t;

     TYPE1 prod;

     TYPE2 sum = init;

   loop:

     sum_0 = phi <init, sum_1>

     S1  x_t = ...

     S2  y_t = ...

     S3  x_T = (TYPE1) x_t;

     S4  y_T = (TYPE1) y_t;

     S5  prod = x_T * y_T;

     [S6  prod = (TYPE2) prod;  #optional]

     S6' prod1 = prod1 <bin-op> <opnd>

     S7  sum_1 = prod1 + sum_0;



where S6' is vectorizable.

Reply via email to