Noticed while (still...) working on PR84038.  The vectorizer happily
tries to construct a V4SFmode from two V2SFmode vectors because
there's an optab handler for it.  But it failed to check whether
that mode is supported and RTL expansion later uses TYPE_MODE
to get at the element mode which ends up as BLKmode and thus
we go through the stack...

So this makes the vectorizer test targetm.vector_mode_supported_p
as well before making use of such types.  In the above case the
vectorizer then resorts to using two DImode scalars instead.
I've verified that's still faster than doing four SFmode scalar
loads despite whatever reformatting penalty that might occur.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

For PR84038 this makes a difference when compiling with
-mprefer-avx128 -fno-vect-cost-model.

Richard.

2018-02-08  Richard Biener  <rguent...@suse.de>

        PR tree-optimization/84278
        * tree-vect-stmts.c (vectorizable_store): When looking for
        smaller vector types to perform grouped strided loads/stores
        make sure the mode is supported by the target.
        (vectorizable_load): Likewise.

        * gcc.target/i386/pr84278.c: New testcase.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c       (revision 257477)
+++ gcc/tree-vect-stmts.c       (working copy)
@@ -6510,6 +6558,7 @@ vectorizable_store (gimple *stmt, gimple
              machine_mode vmode;
              if (!mode_for_vector (elmode, group_size).exists (&vmode)
                  || !VECTOR_MODE_P (vmode)
+                 || !targetm.vector_mode_supported_p (vmode)
                  || (convert_optab_handler (vec_extract_optab,
                                             TYPE_MODE (vectype), vmode)
                      == CODE_FOR_nothing))
@@ -6528,6 +6577,7 @@ vectorizable_store (gimple *stmt, gimple
                     element size stores.  */
                  if (mode_for_vector (elmode, lnunits).exists (&vmode)
                      && VECTOR_MODE_P (vmode)
+                     && targetm.vector_mode_supported_p (vmode)
                      && (convert_optab_handler (vec_extract_optab,
                                                 vmode, elmode)
                          != CODE_FOR_nothing))
@@ -7573,6 +7633,7 @@ vectorizable_load (gimple *stmt, gimple_
              machine_mode vmode;
              if (mode_for_vector (elmode, group_size).exists (&vmode)
                  && VECTOR_MODE_P (vmode)
+                 && targetm.vector_mode_supported_p (vmode)
                  && (convert_optab_handler (vec_init_optab,
                                             TYPE_MODE (vectype), vmode)
                      != CODE_FOR_nothing))
@@ -7598,6 +7659,7 @@ vectorizable_load (gimple *stmt, gimple_
                     element loads of the original vector type.  */
                  if (mode_for_vector (elmode, lnunits).exists (&vmode)
                      && VECTOR_MODE_P (vmode)
+                     && targetm.vector_mode_supported_p (vmode)
                      && (convert_optab_handler (vec_init_optab, vmode, elmode)
                          != CODE_FOR_nothing))
                    {
Index: gcc/testsuite/gcc.target/i386/pr84278.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr84278.c     (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr84278.c     (working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -msse2" } */
+
+float A[1024];
+float B[1024];
+int s;
+
+void foo(void)
+{
+  int i;
+  for (i = 0; i < 128; i++)
+    {
+      B[i*2+0] = A[i*s+0];
+      B[i*2+1] = A[i*s+1];
+    }
+}
+
+/* { dg-final { scan-assembler-not "\(%.sp\)" } } */

Reply via email to