[Bug tree-optimization/81038] [8 regression] test case g++.dg/vect/slp-pr56812.cc fails starting with r248678

2018-02-08 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81038

Bill Schmidt  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Bill Schmidt  ---
Richi instead committed the more elegant patch from
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00397.html.  Per Richi, fixed on
x86_64.  I've observed a testresults cycle for powerpc64-linux-gnu where this
now passes, so looks fixed to me.
Thanks!

[Bug tree-optimization/81038] [8 regression] test case g++.dg/vect/slp-pr56812.cc fails starting with r248678

2018-02-02 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81038

--- Comment #9 from Bill Schmidt  ---
Prospective patch posted at
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00137.html.

[Bug tree-optimization/81038] [8 regression] test case g++.dg/vect/slp-pr56812.cc fails starting with r248678

2018-02-02 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81038

--- Comment #8 from Bill Schmidt  ---
The commentary for r248678 reads in part: "Compute costs for doing no peeling
at all, compare to the best peeling costs so far and avoid peeling if cheaper."
 Indeed, if you look at the vect dump for r248677, you see that the vectorizer
decides to force alignment using peeling, even though the target processor has
efficient unaligned memory access.  Peeling proved to be barely unprofitable:

/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note: Cost model analysis:
  Vector inside of loop cost: 1
  Vector prologue cost: 7
  Vector epilogue cost: 6
  Scalar iteration cost: 1
  Scalar outside cost: 0
  Vector outside cost: 13
  prologue iterations: 2
  epilogue iterations: 2
  Calculated minimum iters for profitability: 17
/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note:   Runtime profitability threshold = 16
/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note:   Static estimate profitability threshold = 16
/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note: not vectorized: vectorization not profitable.

In the vect dump for r248678, the vectorizer isn't overly focused on peeling,
and determines that it can use the efficient unaligned storage accesses.  This
leads to the more reasonable cost calculation:

/home/wschmidt/gcc/gcc-mainline-test/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note: Cost model analysis:
  Vector inside of loop cost: 1
  Vector prologue cost: 1
  Vector epilogue cost: 0
  Scalar iteration cost: 1
  Scalar outside cost: 0
  Vector outside cost: 1
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 2
/home/wschmidt/gcc/gcc-mainline-test/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note:   Runtime profitability threshold = 3
/home/wschmidt/gcc/gcc-mainline-test/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note:   Static estimate profitability threshold = 3
/home/wschmidt/gcc/gcc-mainline-test/gcc/testsuite/g++.dg/vect/slp-pr56812.cc:1\
6:18: note: loop vectorized

For this processor, we vectorized the code in "vect" rather than in "slp".  For
other processors, the choice could be different because of cost model
differences.  But I think in general we should always vectorize.  In both cases
the "optimized" dump produces:

void mydata::Set(float) (struct mydata * const this, float x)
{
  vector(4) float vect_cst__10;

   [11.11%]:
  vect_cst__10 = {x_5(D), x_5(D), x_5(D), x_5(D)};
  MEM[(float *)this_4(D)] = vect_cst__10;
  MEM[(float *)this_4(D) + 16B] = vect_cst__10;
  return;

}

So I think perhaps it would be better to change the test to examine the
"optimized" dump for one definition and two uses of a vect_cst__*.  The point
of the original complaint in PR56812 was that this test case was not vectorized
(by SLP at the time), but so long as it is vectorized, that should be good
enough for everyone.

[Bug tree-optimization/81038] [8 regression] test case g++.dg/vect/slp-pr56812.cc fails starting with r248678

2018-02-02 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81038

Bill Schmidt  changed:

   What|Removed |Added

   Assignee|acsawdey at gcc dot gnu.org|wschmidt at gcc dot 
gnu.org

--- Comment #7 from Bill Schmidt  ---
I'm looking at this one.

[Bug tree-optimization/81038] [8 regression] test case g++.dg/vect/slp-pr56812.cc fails starting with r248678

2018-01-26 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81038

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Target|powerpc*-*-*, i?86-*-*, |powerpc*-*-*, i?86-*-*,
   |x86_64-*-*, aarch64-*-* |x86_64-*-*, aarch64-*-*,
   ||arm*-*-*
 CC||ktkachov at gcc dot gnu.org

--- Comment #6 from ktkachov at gcc dot gnu.org ---
I'm also seeing this FAIL on arm

[Bug tree-optimization/81038] [8 regression] test case g++.dg/vect/slp-pr56812.cc fails starting with r248678

2018-01-23 Thread ro at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81038

Rainer Orth  changed:

   What|Removed |Added

 Target|powerpc*-*-*|powerpc*-*-*, i?86-*-*,
   ||x86_64-*-*, aarch64-*-*
 CC||ro at gcc dot gnu.org

--- Comment #5 from Rainer Orth  ---
Just for the record, this only affects several x86 targets.

[Bug tree-optimization/81038] [8 regression] test case g++.dg/vect/slp-pr56812.cc fails starting with r248678

2018-01-09 Thread acsawdey at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81038

acsawdey at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||acsawdey at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |acsawdey at gcc dot 
gnu.org

--- Comment #4 from acsawdey at gcc dot gnu.org ---
At present trunk is vectorizing this in the vect pass not unrolling and
vectorizing in slp.

Code generated for mydata::Set is:

_ZN6mydata3SetEf:
.LFB4:
.cfi_startproc
xscvdpspn 1,1
li 9,16
xxspltw 0,1,0
stxvd2x 0,0,3
stxvd2x 0,3,9
blr

It seems like the test case should be looking for this alternative, I can't see
how a loop with a single stxvd2x that runs two iterations would be better.

[Bug tree-optimization/81038] [8 regression] test case g++.dg/vect/slp-pr56812.cc fails starting with r248678

2017-08-01 Thread sje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81038

Steve Ellcey  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-08-01
 CC||sje at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #3 from Steve Ellcey  ---
Looking at the slp dump file on aarch64 where this also fails I see these
messages:

slp-pr56812.cc:18:1: note: === vect_analyze_data_refs ===
slp-pr56812.cc:18:1: note: not vectorized: no vectype for stmt: MEM[(float
*)thi
s_4(D)] = vect_cst__10;
 scalar_type: vector(4) float
slp-pr56812.cc:18:1: note: not vectorized: no vectype for stmt: MEM[(float
*)vec
tp_this.5_6] = vect_cst__10;
 scalar_type: vector(4) float
slp-pr56812.cc:18:1: note: === vect_analyze_data_ref_accesses ===
slp-pr56812.cc:18:1: note: not vectorized: no grouped stores in basic block.

[Bug tree-optimization/81038] [8 regression] test case g++.dg/vect/slp-pr56812.cc fails starting with r248678

2017-06-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81038

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |8.0

--- Comment #2 from Richard Biener  ---
Eventually the loop is no longer unrolled (was it?) and is now loop vectorized?
(and that bit is "fragile" because of -fvect-cost-model=dynamic?)

Just guessing.

[Bug tree-optimization/81038] [8 regression] test case g++.dg/vect/slp-pr56812.cc fails starting with r248678

2017-06-09 Thread seurer at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81038

seurer at gcc dot gnu.org changed:

   What|Removed |Added

 Target||powerpc*-*-*
 CC||krebbel at gcc dot gnu.org,
   ||wschmidt at gcc dot gnu.org
   Host||powerpc*-*-*
  Build||powerpc*-*-*

--- Comment #1 from seurer at gcc dot gnu.org ---
Note:  fails on powerpc64 both BE and LE.