https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68238

            Bug ID: 68238
           Summary: Vector cost model overestimates prologue cost for
                    SLPed code
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jgreenhalgh at gcc dot gnu.org
  Target Milestone: ---
              Host: *-*-*
            Target: *-*-*

Created attachment 36663
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36663&action=edit
reduced testcase showing high costs analysis

The attached testcase is derived from a benchmark which shows a performance
regression under GCC 4.9 and GCC 5.2. At the root of the regression is the
runtime profitability calculation which decides whether to execute the scalar
or the vector code path. GCC 4.9 and 5.2 both return a much higher guess at the
minimum number of iterations for the vector code-path to be profitable,
consequently low values of "size" are sent on the scalar path and show a drop
in performance along the magnitude of the number of vector lanes your target
can load. 

I'm compiling the testcase (on x86_64-none-linux-gnu or aarch64-none-linux-gnu
- though AArch64 vector costs are unreliable in 4.9 and 5.2) with:

  <gcc> -O3 slp-costs.c

On my (x86_64) system GCC 4.8.2 the cost analysis looks like:

slp-costs.c:7: note: Cost model analysis: 
  Vector inside of loop cost: 32
  Vector prologue cost: 10
  Vector epilogue cost: 0
  Scalar iteration cost: 64
  Scalar outside cost: 1
  Vector outside cost: 10
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1

On my (x86_64) 5.2 the cost analysis looks like:

slp-costs.c:7:3: note: Cost model analysis: 
  Vector inside of loop cost: 32
  Vector prologue cost: 1033
  Vector epilogue cost: 0
  Scalar iteration cost: 64
  Scalar outside cost: 1
  Vector outside cost: 1033
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 33

Trunk starts to get this right again after r228751 . I had a look at
backporting that patch but it uses some of the new hash-table stuff so it won't
be a trivial backport.

slp-costs.c:7:3: note: Cost model analysis: 
  Vector inside of loop cost: 32
  Vector prologue cost: 10
  Vector epilogue cost: 0
  Scalar iteration cost: 64
  Scalar outside cost: 1
  Vector outside cost: 10
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1
slp-costs.c:7:3: note:   Runtime profitability threshold = 0

Reply via email to