[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2016-08-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|4.9.4   |---

[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2015-06-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|4.9.3   |4.9.4


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2015-06-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

--- Comment #16 from Jakub Jelinek jakub at gcc dot gnu.org ---
GCC 4.9.3 has been released.


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-11-17 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

--- Comment #14 from rguenther at suse dot de rguenther at suse dot de ---
On Fri, 14 Nov 2014, jakub at gcc dot gnu.org wrote:

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984
 
 Jakub Jelinek jakub at gcc dot gnu.org changed:
 
What|Removed |Added
 
  Status|ASSIGNED|NEW
  CC||jamborm at gcc dot gnu.org,
||rguenth at gcc dot gnu.org
Assignee|jakub at gcc dot gnu.org   |unassigned at gcc dot 
 gnu.org
 
 --- Comment #13 from Jakub Jelinek jakub at gcc dot gnu.org ---
 (In reply to Stupachenko Evgeny from comment #12)
  Created attachment 33963 [details]
  test case where pragma simd disable vectorization
  
  The following test case compiled with -Ofast vectorize the loop in the
  GetXsum function.
  Adding -fopenmp leads to failed vectorization due to:
  
  simd_issue.cpp:26:18: note: not vectorized: data ref analysis failed
  D.2329[_7].x = _12;
  
  It looks like before the patch in this Bug loop was vectorized with 
  -fopenmp.
 
 The testcase is invalid, you need reduction(+:sim) clause, otherwise the loop
 has invalid inter-iteration dependencies.
 
 That said, even with that, with C it vectorizes fine, while with C++ it
 doesn't.
 
 In *.einline the C - C++ difference is (before that I don't see such):
 -  D.1856[_19].x = _24;
 -  _26 = D.1856[_19];
 -  _27 = MEM[(const struct XY *)_26].x;
 +  D.2352[_19].x = _24;
 +  _26 = D.2352[_19];
 +  _40 = MEM[(float *)_26];
 
 In *.ealias the C - C++ difference is:
 -  D.1856[_19].x = _24;
 -  _27 = MEM[(const struct XY *)D.1856][_19].x;
 +  D.2352[_19].x = _24;
 +  _26 = D.2352[_19];
 +  _40 = MEM[(float *)_26];
 
 and apparently FRE1 handles the former but not the latter.  Richard?
 As the struct contains float at that offset, I don't see why FRE1 shouldn't
 optimize that to _40 = _24.
 
 Shorter testcase for the FRE1 missed-optimization:
 struct S { float a, b; };
 
 float
 foo (int x, float y)
 {
   struct S z[1024];
   z[x].a = y;
   struct S *p = z[x];
   float *q = (float *) p;
   return *q;
 }

I will have a look - it's designed to handle that fine.

 (dunno why the inliner handles things differently between C and C++ on the 
 #c12
 testcase).  Now, as for vectorizing it even if FRE isn't able to optimize it,
 we currently don't support interleaved accesses to the omp simd array
 attributed arrays, perhaps we could at least some easy cases thereof, and
 supposedly we should teach SRA about those too (like, if the arrays aren't
 addressable and aren't accesses as whole, but just individual fields, split it
 into separate omp simd array accesses instead.  In this particular case due
 to the FRE missed optimization it is addressable though.
 Or perhaps teach fold to gimple folding to fold that:
   q_5 = z[x_2(D)];
   _6 = *q_5;
 back into:
   _6 = z[x_2(D)].x;
 ?

No, that's generally invalid (forwprop does that if types match
closely enough which appearantly they don't?)


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-11-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

--- Comment #15 from Richard Biener rguenth at gcc dot gnu.org ---
Ok, so SCCVN only can forward constant addresses (because variable ones
wouldn't fit in the same refs vector without inserting ops).  See
vn_reference_maybe_forwprop_address.  It's also required so that we can
re-materialize the expression in PRE for example and not wind up with
invalid access paths there (similar to the reason why forwprop doesn't do
this transform).

I think it might be possible to teach SCCVN to do this, but it might be
not trivial.


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-11-14 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
 CC||jamborm at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
   Assignee|jakub at gcc dot gnu.org   |unassigned at gcc dot 
gnu.org

--- Comment #13 from Jakub Jelinek jakub at gcc dot gnu.org ---
(In reply to Stupachenko Evgeny from comment #12)
 Created attachment 33963 [details]
 test case where pragma simd disable vectorization
 
 The following test case compiled with -Ofast vectorize the loop in the
 GetXsum function.
 Adding -fopenmp leads to failed vectorization due to:
 
 simd_issue.cpp:26:18: note: not vectorized: data ref analysis failed
 D.2329[_7].x = _12;
 
 It looks like before the patch in this Bug loop was vectorized with -fopenmp.

The testcase is invalid, you need reduction(+:sim) clause, otherwise the loop
has invalid inter-iteration dependencies.

That said, even with that, with C it vectorizes fine, while with C++ it
doesn't.

In *.einline the C - C++ difference is (before that I don't see such):
-  D.1856[_19].x = _24;
-  _26 = D.1856[_19];
-  _27 = MEM[(const struct XY *)_26].x;
+  D.2352[_19].x = _24;
+  _26 = D.2352[_19];
+  _40 = MEM[(float *)_26];

In *.ealias the C - C++ difference is:
-  D.1856[_19].x = _24;
-  _27 = MEM[(const struct XY *)D.1856][_19].x;
+  D.2352[_19].x = _24;
+  _26 = D.2352[_19];
+  _40 = MEM[(float *)_26];

and apparently FRE1 handles the former but not the latter.  Richard?
As the struct contains float at that offset, I don't see why FRE1 shouldn't
optimize that to _40 = _24.

Shorter testcase for the FRE1 missed-optimization:
struct S { float a, b; };

float
foo (int x, float y)
{
  struct S z[1024];
  z[x].a = y;
  struct S *p = z[x];
  float *q = (float *) p;
  return *q;
}

(dunno why the inliner handles things differently between C and C++ on the #c12
testcase).  Now, as for vectorizing it even if FRE isn't able to optimize it,
we currently don't support interleaved accesses to the omp simd array
attributed arrays, perhaps we could at least some easy cases thereof, and
supposedly we should teach SRA about those too (like, if the arrays aren't
addressable and aren't accesses as whole, but just individual fields, split it
into separate omp simd array accesses instead.  In this particular case due
to the FRE missed optimization it is addressable though.
Or perhaps teach fold to gimple folding to fold that:
  q_5 = z[x_2(D)];
  _6 = *q_5;
back into:
  _6 = z[x_2(D)].x;
?


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-11-13 Thread evstupac at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

--- Comment #12 from Stupachenko Evgeny evstupac at gmail dot com ---
Created attachment 33963
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33963action=edit
test case where pragma simd disable vectorization

The following test case compiled with -Ofast vectorize the loop in the
GetXsum function.
Adding -fopenmp leads to failed vectorization due to:

simd_issue.cpp:26:18: note: not vectorized: data ref analysis failed
D.2329[_7].x = _12;

It looks like before the patch in this Bug loop was vectorized with -fopenmp.


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-10-30 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|4.9.2   |4.9.3

--- Comment #11 from Jakub Jelinek jakub at gcc dot gnu.org ---
GCC 4.9.2 has been released.


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-07-16 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|4.9.1   |4.9.2

--- Comment #10 from Jakub Jelinek jakub at gcc dot gnu.org ---
GCC 4.9.1 has been released.


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-04-22 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|4.9.0   |4.9.1

--- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org ---
GCC 4.9.0 has been released


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-02-08 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

--- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org ---
Author: jakub
Date: Sat Feb  8 09:10:14 2014
New Revision: 207629

URL: http://gcc.gnu.org/viewcvs?rev=207629root=gccview=rev
Log:
PR c/59984
* gimplify.c (gimplify_bind_expr): In ORT_SIMD region
mark local addressable non-static vars as GOVD_PRIVATE
instead of GOVD_LOCAL.
* omp-low.c (lower_omp_for): Move gimple_bind_vars
and BLOCK_VARS of gimple_bind_block to new_stmt rather
than copying them.

* gcc.dg/vect/pr59984.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/vect/pr59984.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimplify.c
trunk/gcc/omp-low.c
trunk/gcc/testsuite/ChangeLog


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-02-08 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

--- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org ---
The wrong-code issue is now fixed, but we still don't vectorize this.
See http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00514.html
for details.  Note that without making the two pointer parameters linear on the
elemental function it actually will not be successfully vectorized unless you
have CPU that can do scatter (Skylake, anything else?).

Keeping this PR open so that I don't forget to address this.


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-02-08 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-02-07 Thread izamyatin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

--- Comment #4 from Igor Zamyatin izamyatin at gmail dot com ---
vect details show that v1.0_14 = v1 and v2.1_15 = v2 are treated as invariants:

test.c:24:14: note: --vectorizing statement: v1.0_14 = v1;

test.c:24:14: note: transform statement.
test.c:24:14: note: transform load. ncopies = 1
test.c:24:14: note: create vector_type-pointer variable to type: vector(4) int 
vectorizing a pointer ref: v1
test.c:24:14: note: created vectp_v1.11_40
test.c:24:14: note: add new stmt: vect_v1.12_37 = MEM[(int *)vectp_v1.10_39];

test.c:24:14: note: hoisting out of the vectorized loop: v1.0_14 = v1;

test.c:24:14: note: created new init_stmt: vect_cst_.13_7 = {v1.0_8, v1.0_8,
v1.0_8, v1.0_8};

test.c:24:14: note: --vectorizing statement: v2.1_15 = v2;

test.c:24:14: note: transform statement.
test.c:24:14: note: transform load. ncopies = 1
test.c:24:14: note: create vector_type-pointer variable to type: vector(4) int 
vectorizing a pointer ref: v2
test.c:24:14: note: created vectp_v2.15_1
test.c:24:14: note: add new stmt: vect_v2.16_60 = MEM[(int *)vectp_v2.14_58];

test.c:24:14: note: hoisting out of the vectorized loop: v2.1_15 = v2;

test.c:24:14: note: created new init_stmt: vect_cst_.17_62 = {v2.1_61, v2.1_61,
v2.1_61, v2.1_61};


Step for both loads determined as 0. Seems support for such case should be
explicitly added


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-02-07 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

--- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org ---
I guess during gimplification when gimplifying #pragma omp simd body we should
add all the local variables (declared in the body) that aren't static and are
address taken to private clause on the #pragma omp simd, so that they will be
handled like if you in your testcase do instead:
int v1, v2;
#pragma omp simd private (v1, v2)
for(i = 0; i  1000; i++)
{
  foo(a[i], v1, v2);
  a[i] = v1 * v2;
}

We don't vectorize that right now, but that is something that is likely
fixable.


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-02-07 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

--- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org ---
Created attachment 32077
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32077action=edit
Untested fix

Untested fix.  Likely I'll need to do something more, so that we don't emit
extra decls for the debug info outside of the #pragma omp simd (flag on the
private clause?) and add a testcase.

And, the reason why with this patch (or the adjusted testcase with explicit
private clause) isn't vectorized is that we'd need to teach vectorizer how to
vectorize SSA_NAME = array[SSA_NAME]; where the second SSA_NAME is
GOMP_SIMD_LANE () result and array is an omp simd array array, as a vector
of addresses (initialized before the loop), containing { array[0], array[1],
..., array[vf-1] }.


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-02-06 Thread izamyatin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

--- Comment #3 from Igor Zamyatin izamyatin at gmail dot com ---
Vectorizer dump snippet for main:

  foo.simdclone.0 (vect__12.7_3, vect_cst_.8_53, vect_cst_.8_53,
vect_cst_.9_51, vect_cst_.9_51);
  GIMPLE_NOP
  vect_v1.12_37 = MEM[(int *)vectp_v1.10_39];   (1)
  v1.0_14 = v1;
  vect_v2.16_60 = MEM[(int *)vectp_v2.14_58];   (2)
  v2.1_15 = v2;
  vect__16.18_63 = vect_cst_.13_7 * vect_cst_.17_62;  --- constants instead of 
  _16 = v1.0_14 * v2.1_15; vect_v1.12_37 and
  MEM[(int *)vectp_a.19_65] = vect__16.18_63;  vect_v2.16_60 


Then DCE destroys (1) and (2) and later LIM hoists the multiplication away from
the loop.


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-02-06 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||openmp
 CC||jakub at gcc dot gnu.org
   Target Milestone|--- |4.9.0


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-02-05 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

Marek Polacek mpolacek at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-02-05
 CC||mpolacek at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Marek Polacek mpolacek at gcc dot gnu.org ---
Confirmed.


[Bug c/59984] OpenMP and Cilk Plus SIMD pragma makes loop incorrect

2014-01-29 Thread evstupac at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59984

--- Comment #1 from Stupachenko Evgeny evstupac at gmail dot com ---
Created attachment 31981
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31981action=edit
test case