[Bug tree-optimization/80283] [5/6/7 Regression] bad SIMD register allocation

2017-04-02 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

Marc Glisse  changed:

   What|Removed |Added

   Keywords||ra

--- Comment #4 from Marc Glisse  ---
IMO this should be rtl-optimization or middle-end. The .optimized dump looks
fine to me. Expansion pulls many of the vec_duplicate to the beginning of the
loop (they were interleaved with the uses before), which increases live ranges
a lot, and nothing moves them back closer to their use. I don't know if doing
the reads early, as gcc chooses to do, can ever compensate for having to spill
on this testcase, since the memory access pattern seems quite cache-friendly.

[Bug tree-optimization/80283] [5/6/7 Regression] bad SIMD register allocation

2017-04-02 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

Markus Trippelsdorf  changed:

   What|Removed |Added

 CC||glisse at gcc dot gnu.org

--- Comment #3 from Markus Trippelsdorf  ---
Apparently started with r217608.

The following hacky partial revert fixes the issue:

diff --git a/gcc/config/i386/avxintrin.h b/gcc/config/i386/avxintrin.h
index b5730f842a7c..c36ebc3dce70 100644
--- a/gcc/config/i386/avxintrin.h
+++ b/gcc/config/i386/avxintrin.h
@@ -145,7 +145,7 @@ _mm256_add_pd (__m256d __A, __m256d __B)
 extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
 _mm256_add_ps (__m256 __A, __m256 __B)
 {
-  return (__m256) ((__v8sf)__A + (__v8sf)__B);
+  return (__m256) __builtin_ia32_addps256 ((__v8sf)__A, (__v8sf)__B);
 }

 extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__,
__artificial__))

[Bug tree-optimization/80283] [5/6/7 Regression] bad SIMD register allocation

2017-04-02 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

Markus Trippelsdorf  changed:

   What|Removed |Added

  Known to work||4.9.3
Summary|bad SIMD register   |[5/6/7 Regression] bad SIMD
   |allocation  |register allocation
  Known to fail||5.4.0, 6.2.0, 7.0

--- Comment #2 from Markus Trippelsdorf  ---
4.9.3 generates fine code, so this is a regression.