[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872

2018-10-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891

Jakub Jelinek  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 CC||jakub at gcc dot gnu.org
 Resolution|--- |FIXED
   Target Milestone|6.5 |7.3

--- Comment #9 from Jakub Jelinek  ---
GCC 6 branch is being closed, fixed in 7.x.

[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872

2018-05-24 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891

--- Comment #8 from rguenther at suse dot de  ---
On Thu, 24 May 2018, jason.vas.dias at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891
> 
> --- Comment #7 from Jason Vas Dias  ---
> Aha!
> Yes, I was experimenting with the new '-march=haswell' and
> '-mtune=intel' options
> (  which seem to me to be the wrong way round - shouldn't 'haswell' be an
>'-mtune' option and 'intel' be an '-march' option ? but this is not
> the case,
>   according to documentation.
> ) .
> GCC 6.4.1 was configured with :
> 
> ./configure \
>--prefix=/usr/local --libdir=/usr/local/lib64 --enable-languages=all \
>   --enable-targets=all --enable-multilib --enable-threads=posix --enable-lto \
>   --with-cpu-64=intel --with-cpu-32=generic \
>   --with-arch-64=haswell --with-tune-64=intel --with-arch-32=i686 \
>   --with-fp=sse+387 --with-tune-32=generic --enable-shared \
>   --with-pic --with-gmp=/usr/local --with-isl=/usr/local \
>   --with-cloog=/usr/local --with-mpc=/usr/local --with-isl=/usr/local \
>   --with-system-zlib --with-gnu-ld --with-gnu-as --enable-serial-configure \
>   --host=x86_64-linux-gnu --build=x86_64-linux-gnu --target=x86_64-linux-gnu
> '
> 
> What I am trying to achieve is that the DEFAULT 64-bit platform for the
> compiler
> (the target the compiler builds for without any  '-m=yyy' options)   
> should
> be '-march=haswell -mtune=intel', which  I think should be the equivalent
> to the older options  '-march=x86-64 -mtune=haswell' , and to
>  '-mtune=native' on this platform - please let me know if this is not the
> case .
> 
> The 5.5.0 & 7.3.1 compilers were built with
>   '--with-arch64=x86-64 --with-cpu64=haswell' ,
> but re-reading the updated 6.4.1 '-mtune'/'-march' documentation led
> me to believe
> that the new '--with-arch-64=haswell --with-tune-64=intel' options were
> more appropriate . I guess not ?
> (The 5.5.0 and 7.3.1 builds are 6months & 2months old, before the
> '-march=haswell' support.
> ).
> 
> I will try rebuilding 6.3.1 with '--with-arch64=x86-64
> --with-cpu64=haswell' and
> retest.  Thanks!

The testsuite is mostly "tuned" to the defaults, that is -march=x86-64
and -mtune=generic.  So you likely won't have luck with the above
choice either.

The testcase could be improved to handle the situation more gracefully
but really there's no point on the old GCC 6 branch.

[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872

2018-05-24 Thread jason.vas.dias at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891

--- Comment #7 from Jason Vas Dias  ---
Aha!
Yes, I was experimenting with the new '-march=haswell' and
'-mtune=intel' options
(  which seem to me to be the wrong way round - shouldn't 'haswell' be an
   '-mtune' option and 'intel' be an '-march' option ? but this is not
the case,
  according to documentation.
) .
GCC 6.4.1 was configured with :

./configure \
   --prefix=/usr/local --libdir=/usr/local/lib64 --enable-languages=all \
  --enable-targets=all --enable-multilib --enable-threads=posix --enable-lto \
  --with-cpu-64=intel --with-cpu-32=generic \
  --with-arch-64=haswell --with-tune-64=intel --with-arch-32=i686 \
  --with-fp=sse+387 --with-tune-32=generic --enable-shared \
  --with-pic --with-gmp=/usr/local --with-isl=/usr/local \
  --with-cloog=/usr/local --with-mpc=/usr/local --with-isl=/usr/local \
  --with-system-zlib --with-gnu-ld --with-gnu-as --enable-serial-configure \
  --host=x86_64-linux-gnu --build=x86_64-linux-gnu --target=x86_64-linux-gnu
'

What I am trying to achieve is that the DEFAULT 64-bit platform for the
compiler
(the target the compiler builds for without any  '-m=yyy' options)   should
be '-march=haswell -mtune=intel', which  I think should be the equivalent
to the older options  '-march=x86-64 -mtune=haswell' , and to
 '-mtune=native' on this platform - please let me know if this is not the
case .

The 5.5.0 & 7.3.1 compilers were built with
  '--with-arch64=x86-64 --with-cpu64=haswell' ,
but re-reading the updated 6.4.1 '-mtune'/'-march' documentation led
me to believe
that the new '--with-arch-64=haswell --with-tune-64=intel' options were
more appropriate . I guess not ?
(The 5.5.0 and 7.3.1 builds are 6months & 2months old, before the
'-march=haswell' support.
).

I will try rebuilding 6.3.1 with '--with-arch64=x86-64
--with-cpu64=haswell' and
retest.  Thanks!


On 24/05/2018, rguenth at gcc dot gnu.org  wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891
>
> --- Comment #6 from Richard Biener  ---
> The log file shows the loop was already vectorized by loop vectorization.
> How
> did you configure gcc?  It might be you configured a default -march/tune
> that
> doesn't match the testcase expectation (and the testcase could probably use
> -ftree-slp-vectorize instead of -ftree-vectorize).
>
> --
> You are receiving this mail because:
> You reported the bug.

[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872

2018-05-24 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891

--- Comment #6 from Richard Biener  ---
The log file shows the loop was already vectorized by loop vectorization.  How
did you configure gcc?  It might be you configured a default -march/tune that
doesn't match the testcase expectation (and the testcase could probably use
-ftree-slp-vectorize instead of -ftree-vectorize).

[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872

2018-05-24 Thread jason.vas.dias at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891

--- Comment #5 from Jason Vas Dias  ---
Could it be an issue to do with running on different hardware?
The CPU on the machine is a rather old 4-core (8 with HyperThreading)
Haswell :



processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 60
model name  : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping: 3
microcode   : 0x22
cpu MHz : 3400.000
cache size  : 8192 KB
physical id : 0
siblings: 8
core id : 0
cpu cores   : 4
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc a
perfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3
sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi
flexpriority ept vpid
fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat
pln pts
bogomips: 6784.22
clflush size: 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872

2018-05-24 Thread jason.vas.dias at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891

--- Comment #4 from Jason Vas Dias  ---
Same commands run by GCC 5.5.0 or GCC 7.3.1 succeed:

$ g++5 slp-pr56812.cc -nostdinc++ -std=c++98 -O2 -ftree-vectorize
-fno-vect-cost-model -msse2 -fdump-tree-slp-details=gcc5.out -O3 -funroll-loops
-fvect-cost-model=dynamic -S -o slp-pr56812.gcc5.s
$ grep 'basic block vectorized' gcc5.out
slp-pr56812.cc:17:16: note: basic block vectorized
$ gcc_7_3_env
$ g++7 slp-pr56812.cc -nostdinc++ -std=c++14 -O2 -ftree-vectorize
-fno-vect-cost-model -msse2 -fdump-tree-slp-details=gcc7.out -O3 -funroll-loops
-fvect-cost-model=dynamic -S -o slp-pr56812.gcc7.s
$ grep 'basic block vectorized' gcc7.out
slp-pr56812.cc:18:1: note: basic block vectorized

[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872

2018-05-24 Thread jason.vas.dias at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891

--- Comment #3 from Jason Vas Dias  ---
Created attachment 44174
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44174=edit
slp1 log file

Here is the slp1 log file produced by command:
$
/home/devel/OS/gcc-6-branch/host-x86_64-linux-gnu/gcc/testsuite/g++/../../xg++
-B/home/devel/OS/gcc-6-branch/host-x86_64-linux-gnu/gcc/testsuite/g++/../../
/home/devel/OS/gcc-6-branch/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
-fno-diagnostics-show-caret -fdiagnostics-color=never -nostdinc++
-I/home/devel/OS/gcc-6-branch/x86_64-linux-gnu/libstdc++-v3/include/x86_64-linux-gnu
-I/home/devel/OS/gcc-6-branch/x86_64-linux-gnu/libstdc++-v3/include
-I/home/devel/OS/gcc-6-branch/libstdc++-v3/libsupc++
-I/home/devel/OS/gcc-6-branch/libstdc++-v3/include/backward
-I/home/devel/OS/gcc-6-branch/libstdc++-v3/testsuite/util -fmessage-length=0
-std=c++14 -O2 -ftree-vectorize -fno-vect-cost-model -msse2
-fdump-tree-slp-details -O3 -funroll-loops -fvect-cost-model=dynamic -S -o
slp-pr56812.s


It does not contain the string 'basic block vectorized', so the test fails.

[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872

2018-05-24 Thread jason.vas.dias at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891

--- Comment #2 from Jason Vas Dias  ---
Created attachment 44173
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44173=edit
log file produced by 'make check-g++ 'RUNTESTFLAGS=vect.exp=slp-pr56812*'

Log file showing test failures as requested

[Bug tree-optimization/85891] [6 Regression] Simple loop is not SLP-vectorized after r196872

2018-05-24 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85891

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*
 Status|UNCONFIRMED |WAITING
Version|unknown |6.3.1
   Keywords||missed-optimization,
   ||needs-bisection
   Last reconfirmed||2018-05-24
 CC||rguenth at gcc dot gnu.org
 Ever confirmed|0   |1
Summary|[6.4.1 regression] Simple   |[6 Regression] Simple loop
   |loop is not SLP-vectorized  |is not SLP-vectorized after
   |after r196872   |r196872
   Target Milestone|--- |6.5

--- Comment #1 from Richard Biener  ---
The test works fine for me on x86_64 Linux (openSUSE Leap 42.2) on the GCC 6
branch (r260441).  I don't see anything host specific in it.

Please cut from the testsuite log the compiler commands and attach the
slp1 dump file.