[Bug tree-optimization/114375] [11/12/13/14 Regression] Wrong vectorization of permuted mask load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375 --- Comment #3 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:94c3508c5a14d1948fe3bffa9e16c6f3d9c2836a commit r14-9533-g94c3508c5a14d1948fe3bffa9e16c6f3d9c2836a Author: Richard Biener Date: Mon Mar 18 12:39:03 2024 +0100 tree-optimization/114375 - disallow SLP discovery of permuted mask loads We cannot currently handle permutations of mask loads in code generation or permute optimization. But we simply drop any permutation on the floor, so the following instead rejects the SLP build rather than producing wrong-code. I've also made sure to reject them in vectorizable_load for completeness. PR tree-optimization/114375 * tree-vect-slp.cc (vect_build_slp_tree_2): Compute the load permutation for masked loads but reject it when any such is necessary. * tree-vect-stmts.cc (vectorizable_load): Reject masked VMAT_ELEMENTWISE and VMAT_STRIDED_SLP as those are not supported. * gcc.dg/vect/vect-pr114375.c: New testcase.
[Bug tree-optimization/114375] [11/12/13/14 Regression] Wrong vectorization of permuted mask load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375 --- Comment #2 from Richard Biener --- I see that get_load_store_type and get_group_load_store_type return VMAT_* kinds that do not handle the masked case. There's some rejection in the callers for that case but it's also incomplete (VMAT_ELEMENTWISE and VMAT_STRIDED_SLP but I also think VMAT_CONTIGUOUS_PERMUTE/REVERSE misses some handling, at least the optab check). diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index e8617439a48..5a4eb136c6d 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -10080,6 +10080,14 @@ vectorizable_load (vec_info *vinfo, "unsupported masked emulated gather.\n"); return false; } + else if (memory_access_type == VMAT_ELEMENTWISE + || memory_access_type == VMAT_STRIDED_SLP) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +"unsupported masked strided access.\n"); + return false; + } } bool costing_p = !vec_stmt; should plug the elementwise/SLP case. Interestingly void __attribute__((noipa)) foo (int s, int * __restrict p) { for (int i = 0; i < 64; ++i) { int tem = 0; if (a[i]) tem = p[s*i]; b[i] = tem; } } uses VMAT_GATHER_SCATTER, I failed to get it produce VMAT_ELEMENTWISE.
[Bug tree-optimization/114375] [11/12/13/14 Regression] Wrong vectorization of permuted mask load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375 Richard Biener changed: What|Removed |Added Known to fail||11.4.0, 12.3.0, 13.2.1, ||14.0 Target Milestone|--- |11.5 Keywords||wrong-code Summary|Wrong vectorization of |[11/12/13/14 Regression] |permuted mask load |Wrong vectorization of ||permuted mask load Known to work||10.5.0 CC||rsandifo at gcc dot gnu.org --- Comment #1 from Richard Biener --- I think it's most sensible to reject permuted mask (and also gather) SLP loads for now, code generation definitely doesn't handle it right now. A first step of support might be computing the load-permutation and hope SLP permute optimization will make it identity and only reject it later. SLP permute optimization migh also be able to push the permute to the mask and re-materialize it after the masked load as well. I'll note we don't handle SLP masked loads with gaps either. I'll see to produce the fix on the SLP discovery side (rejecting it there). Richard - any other thoughts on this? It seems GCC 10 is fine, but it can already do SLP of masked loads so I'm not really sure. Guess the testcase is not good enough there.