[Bug tree-optimization/114375] [11/12/13/14 Regression] Wrong vectorization of permuted mask load

2024-03-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:94c3508c5a14d1948fe3bffa9e16c6f3d9c2836a

commit r14-9533-g94c3508c5a14d1948fe3bffa9e16c6f3d9c2836a
Author: Richard Biener 
Date:   Mon Mar 18 12:39:03 2024 +0100

tree-optimization/114375 - disallow SLP discovery of permuted mask loads

We cannot currently handle permutations of mask loads in code generation
or permute optimization.  But we simply drop any permutation on the
floor, so the following instead rejects the SLP build rather than
producing wrong-code.  I've also made sure to reject them in
vectorizable_load for completeness.

PR tree-optimization/114375
* tree-vect-slp.cc (vect_build_slp_tree_2): Compute the
load permutation for masked loads but reject it when any
such is necessary.
* tree-vect-stmts.cc (vectorizable_load): Reject masked
VMAT_ELEMENTWISE and VMAT_STRIDED_SLP as those are not
supported.

* gcc.dg/vect/vect-pr114375.c: New testcase.

[Bug tree-optimization/114375] [11/12/13/14 Regression] Wrong vectorization of permuted mask load

2024-03-18 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375

--- Comment #2 from Richard Biener  ---
I see that get_load_store_type and get_group_load_store_type return VMAT_*
kinds that do not handle the masked case.  There's some rejection in the
callers for that case but it's also incomplete (VMAT_ELEMENTWISE and
VMAT_STRIDED_SLP but I also think VMAT_CONTIGUOUS_PERMUTE/REVERSE misses
some handling, at least the optab check).

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e8617439a48..5a4eb136c6d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -10080,6 +10080,14 @@ vectorizable_load (vec_info *vinfo,
 "unsupported masked emulated gather.\n");
  return false;
}
+  else if (memory_access_type == VMAT_ELEMENTWISE
+  || memory_access_type == VMAT_STRIDED_SLP)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"unsupported masked strided access.\n");
+ return false;
+   }
 }

   bool costing_p = !vec_stmt;

should plug the elementwise/SLP case.  Interestingly

void __attribute__((noipa))
foo (int s, int * __restrict p)
{
  for (int i = 0; i < 64; ++i)
{
  int tem = 0;
  if (a[i])
tem = p[s*i];
  b[i] = tem;
}
}

uses VMAT_GATHER_SCATTER, I failed to get it produce VMAT_ELEMENTWISE.

[Bug tree-optimization/114375] [11/12/13/14 Regression] Wrong vectorization of permuted mask load

2024-03-18 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375

Richard Biener  changed:

   What|Removed |Added

  Known to fail||11.4.0, 12.3.0, 13.2.1,
   ||14.0
   Target Milestone|--- |11.5
   Keywords||wrong-code
Summary|Wrong vectorization of  |[11/12/13/14 Regression]
   |permuted mask load  |Wrong vectorization of
   ||permuted mask load
  Known to work||10.5.0
 CC||rsandifo at gcc dot gnu.org

--- Comment #1 from Richard Biener  ---
I think it's most sensible to reject permuted mask (and also gather) SLP loads
for now, code generation definitely doesn't handle it right now.

A first step of support might be computing the load-permutation and hope
SLP permute optimization will make it identity and only reject it later.
SLP permute optimization migh also be able to push the permute to the mask
and re-materialize it after the masked load as well.

I'll note we don't handle SLP masked loads with gaps either.

I'll see to produce the fix on the SLP discovery side (rejecting it there).

Richard - any other thoughts on this?

It seems GCC 10 is fine, but it can already do SLP of masked loads so I'm
not really sure.  Guess the testcase is not good enough there.