[Bug tree-optimization/111830] "omp simd reduction" cannot collaborate well with “loop peeling”.

2023-10-16 Thread guojie at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111830

--- Comment #1 from Guo Jie  ---
Details in PR111403.

[Bug tree-optimization/111830] New: "omp simd reduction" cannot collaborate well with “loop peeling”.

2023-10-16 Thread guojie at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111830

Bug ID: 111830
   Summary: "omp simd reduction" cannot collaborate well with
“loop peeling”.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: openmp
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojie at loongson dot cn
  Target Milestone: ---

[Bug target/111403] LoongArch: Wrong code with -O -mlasx -fopenmp-simd

2023-10-08 Thread guojie at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111403

Guo Jie  changed:

   What|Removed |Added

 CC||guojie at loongson dot cn

--- Comment #2 from Guo Jie  ---
It seems that “omp simd reduction” cannot collaborate well with “loop peeling”,
which will result in a probability error in this test case.

LoongArch tree vect pass dump:

  # “omp simd” temporary arrays.
  struct S D.3833[8];
  struct S D.3832[8];
  ...


  # prologue loop.
   [local count: 723433550]:
  MEM  [(struct S *)][0].s = 0;
  _44 = D.3832[0].s;
  _41 = (long unsigned int) i_1;
  _58 = _41 * 4;
  _59 = a_18(D) + _58;
  _60 = _59->s;
  _61 = _44 + _60;
  D.3832[0].s = _61;
  _64 = D.3833[0].s;
  _65 = D.3832[0].s;
  _66 = _64 + _65;
  D.3833[0].s = _66;  # Save temporary reduction results.
  MEM  [(struct S *)][0].s = _66;
  _69 = b_28(D) + _58;
  _70 = MEM  [(const struct S &)][0].s;
  _69->s = _70;
  i_72 = i_1 + 1;
  ivtmp_73 = ivtmp_2 - 1;
  ivtmp_78 = ivtmp_77 + 1;
  if (ivtmp_78 < prolog_loop_niters.42_7)
goto ; [85.71%]
  else
goto ; [14.29%]
  [local count: 620085901]:
  goto ; [100.00%]


  # vector body loop.
   [local count: 118111599]:
  # i_48 = PHI 
  # ivtmp_55 = PHI 
  # vectp_a.50_126 = PHI 
  # vectp_b.58_158 = PHI 
  # ivtmp_161 = PHI 
  MEM  [(struct S *)] = { 0, 0, 0, 0, 0, 0, 0, 0 };
  _16 = (long unsigned int) i_48;
  _17 = _16 * 4;
  _19 = a_18(D) + _17;
  vect__20.52_128 = MEM  [(int *)vectp_a.50_126];
  _20 = _19->s;
  MEM  [(int *)] = vect__20.52_128;
  vect__24.54_131 = MEM  [(int *)]; # Wrong value.
  ...
  vect__26.56_133 = vect__20.52_128 + vect__24.54_131;
  ...
  if (ivtmp_162 < bnd.44_109)
goto ; [0.00%]
  else
goto ; [100.00%]
  ...

The temporary reduction result of “prologue loop” is only stored in D.3833[0],
and all other elements of D.3833 are 0. Therefore, only the first element of
vect__26.56_133 accumulates the scalar reduction result of “prologue loop”. 

I think the reasonable solution should be to broadcast the scalar reduction
result of “prologue loop” to all elements of D.3833.