http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56878



             Bug #: 56878

           Summary: Issue with candidate choice in

                    vect_gen_niters_for_prolog_loop.

    Classification: Unclassified

           Product: gcc

           Version: 4.9.0

            Status: UNCONFIRMED

          Severity: normal

          Priority: P3

         Component: tree-optimization

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: ysrum...@gmail.com





We found out 7% performance drop on 482.sphinx3 from spec2006 for -march=corei7

& -mavx which apeeared after fix r196872.



The problem can be reproduced with the attached testcase.

Function vect_gen_niters_for_prolog_loop() uses (after r196872)

non-invariant pointer (v1) for calculation of #iterations for prolog but before

it uses invariant pointer (x) for doing it and all these evaluations

can be hoised out of outer loop:



before fix

  <bb 6>:

  niters.3_17 = (unsigned int) len_7;

  vect_px.4_4 = x_24(D);

  _119 = (unsigned long) vect_px.4_4;

  _118 = _119 & 31;

  _117 = _118 >> 2;

  _116 = -_117;

  _115 = (unsigned int) _116;

  _114 = _115 & 7;

  prolog_loop_niters.5_52 = MIN_EXPR <niters.3_17, _114>;



Note that all these assignments can be hoisted out of loop.



after fix



  <bb 6>:

  niters.3_17 = (unsigned int) len_7;

  vect_pv1.4_4 = v1_16;

  _119 = (unsigned long) vect_pv1.4_4;

 where v1 is not loop invariant.



If trip count for outer loop is huge and trip count for inner loop is small

such code motion can affect on performance dramatically.

To reproduce compile attached test on x86 with the following options:



-O3 -funroll-loops -ffast-math -march=corei7 -mavx

Reply via email to