[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212 --- Comment #6 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:ee467644c53ee2f7d633a8e1f53603feafab4351 commit r13-3226-gee467644c53ee2f7d633a8e1f53603feafab4351 Author: Richard Biener Date: Tue Oct 11 11:34:55 2022 +0200 tree-optimization/107212 - SLP reduction of reduction paths The following fixes an issue with how we handle epilogue generation for SLP reductions of reduction paths where the actual live lanes are not "canonical". We need to make sure to identify all live lanes as reductions and thus have to iterate over all participating SLP lanes when walking the reduction SSA use-def chain. Also the previous attempt likely to mitigate such issue in vectorizable_live_operation is misguided and has to be removed. PR tree-optimization/107212 * tree-vect-loop.cc (vectorizable_reduction): Make sure to set STMT_VINFO_REDUC_DEF for all live lanes in a SLP reduction. (vectorizable_live_operation): Do not pun to the SLP node representative for reduction epilogue generation. * gcc.dg/vect/pr107212-1.c: New testcase. * gcc.dg/vect/pr107212-2.c: Likewise.
[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212 --- Comment #5 from Richard Biener --- Yes, the issue is latent for longer I think.
[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212 --- Comment #4 from Richard Biener --- So the speciality here is that with the SLP reduction we have the live lanes split across the sum and the convert. That wrecks havoc with vectorizable_reduction following one of the lanes in the loop assigning STMT_VINFO_REDUC_DEF to the reduction chain. We simply do /* ??? For epilogue generation live members of the chain need to point back to the PHI via their original stmt for info_for_reduction to work. */ if (STMT_VINFO_LIVE_P (vdef)) STMT_VINFO_REDUC_DEF (def) = phi_info; but in this case this misses one of the paths. Also we're not reliably following the representative here. Plus vectorizable_live_operation doesn't get the representative but the actual scalar stmt defining the live lane (on purpose). So the fix is to make sure the above setting of STMT_VINFO_REDUC_DEF covers all live lanes of the SLP node. For vectorizable_live_operation the else /* For SLP reductions the meta-info is attached to the representative. */ stmt_info = SLP_TREE_REPRESENTATIVE (slp_node); doing is then wrong and /* For SLP reductions we vectorize the epilogue for all involved stmts together. */ else if (slp_index != 0) return true; is also suspicious then but it seems we cope with the conversions just fine. So we're actually vectorizing the epilogue for the live lane 0 in the reduction chain but analysis might end up not following the lane 0 SSA use-def chain and identifying lane > 0 reductions is just to avoid non-reduction live code gen.
[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212 --- Comment #3 from Martin Liška --- Hmm, have a test-case that is miscompiled since r10-4200-gb7ff7cef5005721e: $ cat pr107212.c int sum_1 = 0; int main() { unsigned int tab[6][2] = {{150, 0}, {0, 0}, {0, 0}, {0, 0}, {0, 0}, {0, 0}}; int sum_0 = 0; for (int t = 0; t < 6; t++) { sum_0 += tab[t][0]; sum_1 += tab[t][0]; } if (sum_0 < 100 || sum_0 > 200) __builtin_abort(); return 0; } $ gcc pr107212.c -O3 -std=c99 && ./a.out Aborted (core dumped)
[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED
[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212 --- Comment #2 from Richard Biener --- Confirmed. We vectorize the loop and that triggers full constant folding leading to the wrong result somehow. Same issue with GCC 11 when you add -ftree-vectorize or use -O3, not observed with GCC 10. The reduction epilogue of the SLP reduction looks duplicate wrong.
[Bug tree-optimization/107212] [11/12/13 Regression] Wrong vectorizer code since r11-718-gc735929a2503a7d0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212 Martin Liška changed: What|Removed |Added Ever confirmed|0 |1 Summary|-O2 and -O3 optimizer bug |[11/12/13 Regression] Wrong ||vectorizer code since ||r11-718-gc735929a2503a7d0 CC||marxin at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Status|UNCONFIRMED |NEW Component|c |tree-optimization Target Milestone|--- |11.5 Keywords||wrong-code Last reconfirmed||2022-10-11 --- Comment #1 from Martin Liška --- A bit reduced test-case: $ cat pr107212.c int main() { unsigned int tab[6][2] = { {69, 73}, {36, 40}, {24, 16}, {16, 11}, {4, 5}, {3, 1} }; int flag = 1; int sum_0 = 0; int sum_1 = 0; for(int t=0; t<6; t++) { sum_0 += tab[t][0]; sum_1 += tab[t][1]; } int x1 = (sum_0 < 100); int x2 = (sum_0 > 200); int x3 = (x1 || x2); if(sum_1 > 200) { flag=0; } __builtin_printf("sum_1: %d\n", sum_1); if (x1 || x2) __builtin_abort (); return 0; } With -O3 it started with r11-718-gc735929a2503a7d0.