[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0

2022-10-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:ee467644c53ee2f7d633a8e1f53603feafab4351

commit r13-3226-gee467644c53ee2f7d633a8e1f53603feafab4351
Author: Richard Biener 
Date:   Tue Oct 11 11:34:55 2022 +0200

tree-optimization/107212 - SLP reduction of reduction paths

The following fixes an issue with how we handle epilogue generation
for SLP reductions of reduction paths where the actual live lanes
are not "canonical".  We need to make sure to identify all live
lanes as reductions and thus have to iterate over all participating
SLP lanes when walking the reduction SSA use-def chain.  Also the
previous attempt likely to mitigate such issue in
vectorizable_live_operation is misguided and has to be removed.

PR tree-optimization/107212
* tree-vect-loop.cc (vectorizable_reduction): Make sure to
set STMT_VINFO_REDUC_DEF for all live lanes in a SLP
reduction.
(vectorizable_live_operation): Do not pun to the SLP
node representative for reduction epilogue generation.

* gcc.dg/vect/pr107212-1.c: New testcase.
* gcc.dg/vect/pr107212-2.c: Likewise.

[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0

2022-10-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212

--- Comment #5 from Richard Biener  ---
Yes, the issue is latent for longer I think.

[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0

2022-10-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212

--- Comment #4 from Richard Biener  ---
So the speciality here is that with the SLP reduction we have the live lanes
split across the sum and the convert.  That wrecks havoc with
vectorizable_reduction following one of the lanes in the loop assigning
STMT_VINFO_REDUC_DEF to the reduction chain.  We simply do

  /* ???  For epilogue generation live members of the chain need
 to point back to the PHI via their original stmt for
 info_for_reduction to work.  */
  if (STMT_VINFO_LIVE_P (vdef))
STMT_VINFO_REDUC_DEF (def) = phi_info;

but in this case this misses one of the paths.  Also we're not reliably
following the representative here.  Plus vectorizable_live_operation
doesn't get the representative but the actual scalar stmt defining the
live lane (on purpose).  So the fix is to make sure the above setting
of STMT_VINFO_REDUC_DEF covers all live lanes of the SLP node.  For
vectorizable_live_operation the

  else
/* For SLP reductions the meta-info is attached to
   the representative.  */
stmt_info = SLP_TREE_REPRESENTATIVE (slp_node);

doing is then wrong and

  /* For SLP reductions we vectorize the epilogue for
 all involved stmts together.  */
  else if (slp_index != 0)
return true;

is also suspicious then but it seems we cope with the conversions just
fine.  So we're actually vectorizing the epilogue for the live lane 0
in the reduction chain but analysis might end up not following the lane 0
SSA use-def chain and identifying lane > 0 reductions is just to avoid
non-reduction live code gen.

[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0

2022-10-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212

--- Comment #3 from Martin Liška  ---
Hmm, have a test-case that is miscompiled since r10-4200-gb7ff7cef5005721e:

$ cat pr107212.c
int sum_1 = 0;

int main() {
  unsigned int tab[6][2] = {{150, 0}, {0, 0}, {0, 0}, {0, 0}, {0, 0}, {0, 0}};

  int sum_0 = 0;

  for (int t = 0; t < 6; t++) {
sum_0 += tab[t][0];
sum_1 += tab[t][0];
  }

  if (sum_0 < 100 || sum_0 > 200)
__builtin_abort();
  return 0;
}

$ gcc pr107212.c -O3 -std=c99 && ./a.out
Aborted (core dumped)

[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0

2022-10-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0

2022-10-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212

--- Comment #2 from Richard Biener  ---
Confirmed.  We vectorize the loop and that triggers full constant folding
leading to the wrong result somehow.

Same issue with GCC 11 when you add -ftree-vectorize or use -O3, not observed
with GCC 10.

The reduction epilogue of the SLP reduction looks duplicate wrong.

[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0

2022-10-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212

Martin Liška  changed:

   What|Removed |Added

 Ever confirmed|0   |1
Summary|-O2 and -O3 optimizer bug   |[11/12/13 Regression] Wrong
   ||vectorizer code since
   ||r11-718-gc735929a2503a7d0
 CC||marxin at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
  Component|c   |tree-optimization
   Target Milestone|--- |11.5
   Keywords||wrong-code
   Last reconfirmed||2022-10-11

--- Comment #1 from Martin Liška  ---
A bit reduced test-case:

$ cat pr107212.c
int main() {
unsigned int tab[6][2] = { {69, 73}, {36, 40}, {24, 16}, {16, 11}, {4, 5},
{3, 1} };

int flag = 1;
int sum_0 = 0;
int sum_1 = 0;

for(int t=0; t<6; t++) {
sum_0 += tab[t][0];
sum_1 += tab[t][1];
}

int x1 = (sum_0 < 100);
int x2 = (sum_0 > 200);
int x3 = (x1 || x2);

if(sum_1 > 200) {
flag=0;
}

__builtin_printf("sum_1: %d\n", sum_1);
if (x1 || x2)
  __builtin_abort ();

return 0;
}

With -O3 it started with r11-718-gc735929a2503a7d0.