https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111403
Guo Jie changed:
What|Removed |Added
CC||guojie at loongson dot cn
--- Comment #2 from Guo Jie ---
It seems that “omp simd reduction” cannot collaborate well with “loop peeling”,
which will result in a probability error in this test case.
LoongArch tree vect pass dump:
# “omp simd” temporary arrays.
struct S D.3833[8];
struct S D.3832[8];
...
# prologue loop.
[local count: 723433550]:
MEM [(struct S *)][0].s = 0;
_44 = D.3832[0].s;
_41 = (long unsigned int) i_1;
_58 = _41 * 4;
_59 = a_18(D) + _58;
_60 = _59->s;
_61 = _44 + _60;
D.3832[0].s = _61;
_64 = D.3833[0].s;
_65 = D.3832[0].s;
_66 = _64 + _65;
D.3833[0].s = _66; # Save temporary reduction results.
MEM [(struct S *)][0].s = _66;
_69 = b_28(D) + _58;
_70 = MEM [(const struct S &)][0].s;
_69->s = _70;
i_72 = i_1 + 1;
ivtmp_73 = ivtmp_2 - 1;
ivtmp_78 = ivtmp_77 + 1;
if (ivtmp_78 < prolog_loop_niters.42_7)
goto ; [85.71%]
else
goto ; [14.29%]
[local count: 620085901]:
goto ; [100.00%]
# vector body loop.
[local count: 118111599]:
# i_48 = PHI
# ivtmp_55 = PHI
# vectp_a.50_126 = PHI
# vectp_b.58_158 = PHI
# ivtmp_161 = PHI
MEM [(struct S *)] = { 0, 0, 0, 0, 0, 0, 0, 0 };
_16 = (long unsigned int) i_48;
_17 = _16 * 4;
_19 = a_18(D) + _17;
vect__20.52_128 = MEM [(int *)vectp_a.50_126];
_20 = _19->s;
MEM [(int *)] = vect__20.52_128;
vect__24.54_131 = MEM [(int *)]; # Wrong value.
...
vect__26.56_133 = vect__20.52_128 + vect__24.54_131;
...
if (ivtmp_162 < bnd.44_109)
goto ; [0.00%]
else
goto ; [100.00%]
...
The temporary reduction result of “prologue loop” is only stored in D.3833[0],
and all other elements of D.3833 are 0. Therefore, only the first element of
vect__26.56_133 accumulates the scalar reduction result of “prologue loop”.
I think the reasonable solution should be to broadcast the scalar reduction
result of “prologue loop” to all elements of D.3833.