[Bug tree-optimization/83202] Try joining operations on consecutive array elements during tree vectorization

2019-04-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Richard Biener  ---
The comment#4 case is sth completely different.  If it's really interesting
to re-vectorize already vectorized code please file a different bug.

The other testcases seem to work fine for me now.

[Bug tree-optimization/83202] Try joining operations on consecutive array elements during tree vectorization

2017-12-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202
Bug 83202 depends on bug 83326, which changed state.

Bug 83326 Summary: [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance 
regression with r255267 (reproducer attached)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/83202] Try joining operations on consecutive array elements during tree vectorization

2017-11-29 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202

--- Comment #8 from Richard Biener  ---
Author: rguenth
Date: Thu Nov 30 07:53:31 2017
New Revision: 255267

URL: https://gcc.gnu.org/viewcvs?rev=255267=gcc=rev
Log:
2017-11-30  Richard Biener  

PR tree-optimization/83202
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Add
allow_peel argument and guard peeling.
(canonicalize_loop_induction_variables): Likewise.
(canonicalize_induction_variables): Pass false.
(tree_unroll_loops_completely_1): Pass unroll_outer to disallow
peeling from cunrolli.

* gcc.dg/vect/pr83202-1.c: New testcase.
* gcc.dg/tree-ssa/pr61743-1.c: Adjust.

Added:
trunk/gcc/testsuite/gcc.dg/vect/pr83202-1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
trunk/gcc/tree-ssa-loop-ivcanon.c

[Bug tree-optimization/83202] Try joining operations on consecutive array elements during tree vectorization

2017-11-29 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202

--- Comment #9 from Richard Biener  ---
The last commit fixed the testcase incomment #1.

[Bug tree-optimization/83202] Try joining operations on consecutive array elements during tree vectorization

2017-11-29 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202

--- Comment #7 from rguenther at suse dot de  ---
On Wed, 29 Nov 2017, bugzi...@poradnik-webmastera.com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202
> 
> --- Comment #4 from Daniel Fruzynski  ---
> One more case. Code has to process diagonal half of matrix and uses SSE
> intrinsics - see test1() below. When n is constant like in test2() below, gcc
> unrolls loops. However more more transform could be performed, replace pairs 
> of
> SSE instructions with one AVX one.

GCC currently does not "vectorize" already vectorized code so
this is a much farther away "goal" apart from eventually pattern-matching
some very simple cases.

[Bug tree-optimization/83202] Try joining operations on consecutive array elements during tree vectorization

2017-11-29 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202

--- Comment #6 from Richard Biener  ---
There are multiple issues reflected in this bug.  The last commit addressed the
SLP cost model thing (not fixing any testcase on its own).

[Bug tree-optimization/83202] Try joining operations on consecutive array elements during tree vectorization

2017-11-29 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202

--- Comment #5 from Richard Biener  ---
Author: rguenth
Date: Wed Nov 29 14:38:06 2017
New Revision: 255233

URL: https://gcc.gnu.org/viewcvs?rev=255233=gcc=rev
Log:
2017-11-29  Richard Biener  

PR tree-optimization/83202
* tree-vect-slp.c (scalar_stmts_set_t): New typedef.
(bst_fail): Use it.
(vect_analyze_slp_cost_1): Add visited set, do not account SLP
nodes vectorized to the same stmts multiple times.
(vect_analyze_slp_cost): Allocate a visited set and pass it down.
(vect_analyze_slp_instance): Adjust.
(scalar_stmts_to_slp_tree_map_t): New typedef.
(vect_schedule_slp_instance): Add a map recording the SLP node
representing the vectorized stmts for a set of scalar stmts.
Avoid code-generating redundancies.
(vect_schedule_slp): Allocate map and pass it down.

* gcc.dg/vect/costmodel/x86_64/costmodel-pr83202.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr83202.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-slp.c

[Bug tree-optimization/83202] Try joining operations on consecutive array elements during tree vectorization

2017-11-29 Thread bugzi...@poradnik-webmastera.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202

--- Comment #4 from Daniel Fruzynski  ---
One more case. Code has to process diagonal half of matrix and uses SSE
intrinsics - see test1() below. When n is constant like in test2() below, gcc
unrolls loops. However more more transform could be performed, replace pairs of
SSE instructions with one AVX one.

#include 
#include "immintrin.h"

void test1(double data[100][100], unsigned int n)
{
for (int i = 0; i < n; i++)
{
for (int j = 0; j < i; j += 2)
{
__m128d v = _mm_loadu_pd([i][j]);
v = _mm_mul_pd(v, v);
_mm_storeu_pd([i][j], v);
}
}
}

void test2(double data[100][100])
{
const unsigned int n = 6;
for (int i = 0; i < n; i++)
{
for (int j = 0; j < i; j += 2)
{
__m128d v = _mm_loadu_pd([i][j]);
v = _mm_mul_pd(v, v);
_mm_storeu_pd([i][j], v);
}
}
}

[Bug tree-optimization/83202] Try joining operations on consecutive array elements during tree vectorization

2017-11-29 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202

--- Comment #3 from Richard Biener  ---
For the other case the issue is I think that the SLP instance group size is not
the number of scalar stmts but somehow set to the group-size.  Changing that
has quite some ripple-down effects though.

-> GCC 9.

[Bug tree-optimization/83202] Try joining operations on consecutive array elements during tree vectorization

2017-11-29 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2017-11-29
 Blocks||53947
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
wiht += 4 the inner loop doesn't iterate so it's effectively

void test(double data[4][4])
{
  for (int i = 0; i < 4; i++)
  {
data[i][i] = data[i][i] * data[i][i];
data[i][i+1] = data[i][i+1] * data[i][i+1];
  }
}

we fail to SLP here because we get confused by the computed group size of 5
as there's a gap of three elements between the first stores of each iteration.

When later doing BB vectorization we fail to analyze dependences, likely
because
not analyzing refs as thoroughly as with loops.

For your second example we fail to loop vectorize this because we completely
peel the inner loop in cunrolli, leaving control flow inside the loop...
I have a patch for that one.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug tree-optimization/83202] Try joining operations on consecutive array elements during tree vectorization

2017-11-28 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
  Component|c   |tree-optimization
   Severity|normal  |enhancement