[Bug tree-optimization/91732] Adding omp simd pragma prevents vectorization

2019-09-11 Thread jed at 59A2 dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91732

--- Comment #3 from Jed Brown  ---
> why not use gsym[Q*2*j+i] instead of g[j][0] and similarly gsym[Q*2-j*Q+i] 
> instead of g[j][1]?

The pattern here is that gsym is packed storage of a symmetric 2x2 matrix,
while g unpacks it so that inner loops (intended for unrolling) can be written
using index notation. This case (a finite element quadrature routine for 2D
anisotropic Poisson) is reduced from more complicated examples (such as 3D
nonlinear solid and fluid mechanics) where this technique provides substantial
clarity and correspondence to mathematical notation. The suggested
transformation (eliminating the temporary g[][] in exchange for fancy indexing
of g) is problematic when representing higher order tensors
(https://en.wikipedia.org/wiki/Voigt_notation#Mnemonic_rule).

It's also sometimes desirable to roll the second loop instead of repeating, in
which case you don't get to have a different indexing rule for g[j][0] and
g[j][1].

  for (int i=0; i

[Bug tree-optimization/91732] Adding omp simd pragma prevents vectorization

2019-09-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91732

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
The routine is obfuscated too much, why not use gsym[Q*2*j+i] instead of
g[j][0] and similarly gsym[Q*2-j*Q+i] instead of g[j][1]?
The reason this isn't vectorized is that we need to effectively privatize the g
variable, because every SIMD lane needs different values for it, and SRA isn't
able to split that appart into scalars indexed by the simd lane.
So, in the end this is pretty much a dup of PR91020.

[Bug tree-optimization/91732] Adding omp simd pragma prevents vectorization

2019-09-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91732

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization, openmp
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-09-11
 CC||jakub at gcc dot gnu.org
  Component|c   |tree-optimization
 Blocks||53947
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
The runtime alias test is between the two stores of the inner unrolled loop.
That's dv[i] vs. dv[i+Q].

Creating dr for *_61
analyze_innermost: success.
base_address: dv_44(D)
offset from base address: 0
constant offset from base address: 0
step: 8
base alignment: 8
base misalignment: 0
offset alignment: 512
step alignment: 8
base_object: *dv_44(D)
Access function 0: {0B, +, 8}_1
Creating dr for *_79
analyze_innermost: success.
base_address: (double *) dv_44(D) + (sizetype) ((long unsigned int)
Q_35(D) * 8)
offset from base address: 0
constant offset from base address: 0
step: 8
base alignment: 8
base misalignment: 0
offset alignment: 512
step alignment: 8
base_object: *(double *) dv_44(D) + (sizetype) ((long unsigned int)
Q_35(D) * 8)
Access function 0: {0B, +, 8}_1

it's probably unfortunate association since we compute inside the loop

  _11 = Q_35(D) + i_39;
  _12 = (long unsigned int) _11;
  _13 = _12 * 8;


With OpenMP SIMD we fail to analyze the data-refs:

Creating dr for D.4113[_37][1][0]
analyze_innermost: t.c:4:18: missed:  failed: evolution of offset is not
affine.
base_address:
offset from base address:
constant offset from base address:
step:
base alignment: 0
base misalignment: 0
offset alignment: 0
step alignment: 0
base_object: D.4113
Access function 0: 0
Access function 1: 1
Access function 2: scev_not_known;

where _37 is the SIMD lane.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations