[Bug rtl-optimization/53533] [4.8/4.9/5/6 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark

2015-06-23 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|4.8.5   |4.9.3

--- Comment #30 from Richard Biener rguenth at gcc dot gnu.org ---
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.


[Bug rtl-optimization/53533] [4.8/4.9/5/6 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark

2015-05-04 Thread maltsevm at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533

Mikhail Maltsev maltsevm at gmail dot com changed:

   What|Removed |Added

 CC||maltsevm at gmail dot com

--- Comment #28 from Mikhail Maltsev maltsevm at gmail dot com ---
Created attachment 35455
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35455action=edit
testcase, inlining

This testcase marks some functions with __attribute__((always_inline/noinline))
when -DINLINE_MANUALLY is defined.


[Bug rtl-optimization/53533] [4.8/4.9/5/6 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark

2015-05-04 Thread maltsevm at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533

--- Comment #29 from Mikhail Maltsev maltsevm at gmail dot com ---
Results for attached testcase:

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (Haswell)
g++ -O3 -march=native -mtune=native
1 iterations

Clang 3.7
Total absolute time for int32_t for loop unrolling: 0.99 sec
Total absolute time for int32_t do loop unrolling: 1.00 sec
Total absolute time for double for loop unrolling: 1.37 sec
Total absolute time for double do loop unrolling: 1.37 sec

GCC 4.7.4
Total absolute time for int32_t for loop unrolling: 5.88 sec
Total absolute time for int32_t do loop unrolling: 7.57 sec
Total absolute time for double for loop unrolling: 2.29 sec
Total absolute time for double do loop unrolling: 2.45 sec

GCC 4.8.4
Total absolute time for int32_t for loop unrolling: 3.12 sec
Total absolute time for int32_t do loop unrolling: 3.29 sec
Total absolute time for double for loop unrolling: 1.13 sec
Total absolute time for double do loop unrolling: 1.14 sec

GCC 4.9.2
Total absolute time for int32_t for loop unrolling: 3.02 sec
Total absolute time for int32_t do loop unrolling: 3.29 sec
Total absolute time for double for loop unrolling: 1.10 sec
Total absolute time for double do loop unrolling: 1.13 sec

GCC 6
Total absolute time for int32_t for loop unrolling: 5.95 sec
Total absolute time for int32_t do loop unrolling: 6.95 sec
Total absolute time for double for loop unrolling: 2.39 sec
Total absolute time for double do loop unrolling: 2.39 sec

g++ -DINLINE_MANUALLY -O3 -march=native -mtune=native
5 iterations

Clang 3.7
Total absolute time for int32_t for loop unrolling: 2.43 sec
Total absolute time for int32_t do loop unrolling: 2.32 sec
Total absolute time for double for loop unrolling: 6.38 sec
Total absolute time for double do loop unrolling: 6.38 sec

GCC 4.9.2
Total absolute time for int32_t for loop unrolling: 10.17 sec
Total absolute time for int32_t do loop unrolling: 10.16 sec
Total absolute time for double for loop unrolling: 3.89 sec
Total absolute time for double do loop unrolling: 3.90 sec

GCC 6
Total absolute time for int32_t for loop unrolling: 10.10 sec
Total absolute time for int32_t do loop unrolling: 10.12 sec
Total absolute time for double for loop unrolling: 3.90 sec
Total absolute time for double do loop unrolling: 3.89 sec

g++ -DINLINE_MANUALLY -Ofast -march=native -mtune=native
GCC 6
Total absolute time for int32_t for loop unrolling: 10.11 sec
Total absolute time for int32_t do loop unrolling: 10.11 sec
Total absolute time for double for loop unrolling: 1.14 sec
Total absolute time for double do loop unrolling: 1.15 sec

So, IMHO there is no regression here (at least w.r.t. vectorization). Floating
point loop gets constant-folded, if reassociation is allowed. Also, GCC6 is
able to infer that for and while tests are semantically equivalent and
unifies them.


[Bug rtl-optimization/53533] [4.8/4.9/5/6 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark

2015-05-03 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533

--- Comment #27 from Markus Trippelsdorf trippels at gcc dot gnu.org ---
Created attachment 35448
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35448action=edit
testcase


[Bug rtl-optimization/53533] [4.8/4.9/5/6 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark

2015-05-03 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533

Markus Trippelsdorf trippels at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed|2012-05-31 00:00:00 |2015-5-3
 CC||trippels at gcc dot gnu.org

--- Comment #26 from Markus Trippelsdorf trippels at gcc dot gnu.org ---
For gcc-5 and gcc-6 there is an additional 50% slowdown:

 % g++ -O3 loop_unroll.ii -o loop_unroll
 % time ./loop_unroll 1
./loop_unroll 1 

testdescription   absolute   operations   ratio with
numbertime   per second   test0

 0  int32_t for loop unroll 1   0.14 sec   552.30 M 1.00
 1  int32_t for loop unroll 2   0.11 sec   699.49 M 0.79
 2  int32_t for loop unroll 3   0.14 sec   566.56 M 0.97
 3  int32_t for loop unroll 4   0.15 sec   532.87 M 1.04
 4  int32_t for loop unroll 5   0.10 sec   784.70 M 0.70
 5  int32_t for loop unroll 6   0.09 sec   887.12 M 0.62
 6  int32_t for loop unroll 7   0.09 sec   913.50 M 0.60
 7  int32_t for loop unroll 8   0.08 sec   986.45 M 0.56
 8  int32_t for loop unroll 9   0.23 sec   346.06 M 1.60
 9 int32_t for loop unroll 10   0.08 sec   1040.06 M 0.53
10 int32_t for loop unroll 11   0.23 sec   348.02 M 1.59
11 int32_t for loop unroll 12   0.23 sec   353.38 M 1.56
12 int32_t for loop unroll 13   0.24 sec   338.32 M 1.63
13 int32_t for loop unroll 14   0.24 sec   332.32 M 1.66
14 int32_t for loop unroll 15   0.25 sec   321.15 M 1.72
15 int32_t for loop unroll 16   0.25 sec   318.23 M 1.74
16 int32_t for loop unroll 17   0.24 sec   329.43 M 1.68
17 int32_t for loop unroll 18   0.25 sec   321.34 M 1.72
18 int32_t for loop unroll 19   0.25 sec   314.53 M 1.76
19 int32_t for loop unroll 20   0.25 sec   325.33 M 1.70
20 int32_t for loop unroll 21   0.25 sec   323.67 M 1.71
21 int32_t for loop unroll 22   0.25 sec   316.85 M 1.74
22 int32_t for loop unroll 23   0.25 sec   323.51 M 1.71
23 int32_t for loop unroll 24   0.06 sec   1257.94 M 0.44
24 int32_t for loop unroll 25   0.24 sec   327.77 M 1.69
25 int32_t for loop unroll 26   0.06 sec   1310.44 M 0.42
26 int32_t for loop unroll 27   0.07 sec   1072.85 M 0.51
27 int32_t for loop unroll 28   0.28 sec   283.44 M 1.95
28 int32_t for loop unroll 29   0.30 sec   267.96 M 2.06
29 int32_t for loop unroll 30   0.31 sec   258.88 M 2.13
30 int32_t for loop unroll 31   0.06 sec   1337.64 M 0.41
31 int32_t for loop unroll 32   0.06 sec   1315.10 M 0.42

Total absolute time for int32_t for loop unrolling: 5.85 sec
...
./loop_unroll 1  41.43s user 0.00s system 100% cpu 41.426 total

==

 % /usr/x86_64-pc-linux-gnu/gcc-bin/4.9.2/g++ -O3 loop_unroll.ii -o loop_unroll
 % time ./loop_unroll 1
./loop_unroll 1 

testdescription   absolute   operations   ratio with
numbertime   per second   test0

 0  int32_t for loop unroll 1   0.14 sec   582.13 M 1.00
 1  int32_t for loop unroll 2   0.13 sec   625.41 M 0.93
 2  int32_t for loop unroll 3   0.13 sec   635.76 M 0.92
 3  int32_t for loop unroll 4   0.13 sec   625.41 M 0.93
 4  int32_t for loop unroll 5   0.12 sec   640.96 M 0.91
 5  int32_t for loop unroll 6   0.09 sec   888.11 M 0.66
 6  int32_t for loop unroll 7   0.09 sec   900.10 M 0.65
 7  int32_t for loop unroll 8   0.10 sec   832.20 M 0.70
 8  int32_t for loop unroll 9   0.10 sec   834.22 M 0.70
 9 int32_t for loop unroll 10   0.09 sec   902.04 M 0.65
10 int32_t for loop unroll 11   0.10 sec   805.15 M 0.72
11 int32_t for loop unroll 12   0.10 sec   823.27 M 0.71
12 int32_t for loop unroll 13   0.09 sec   860.51 M 0.68
13 int32_t for loop unroll 14   0.11 sec   753.59 M 0.77
14 int32_t for loop unroll 15   0.10 sec   781.96 M 0.74
15 int32_t for loop unroll 16   0.09 sec   858.76 M 0.68
16 int32_t for loop unroll 17   0.09 sec   846.91 M 0.69
17 int32_t for loop unroll 18   0.10 sec   783.19 M 0.74
18 int32_t for loop unroll 19   0.10 sec   794.81 M 0.73
19 int32_t for loop unroll 20   0.10 sec   806.70 M 0.72
20 int32_t for loop unroll 21   0.10 sec   823.82 M 0.71
21 int32_t for loop unroll 22   0.09 sec   851.74 M 0.68
22 int32_t for loop unroll 23   0.10 sec   792.87 M 0.73
23 int32_t for loop unroll 24   0.10 sec   809.32 M 0.72
24 int32_t for loop unroll 25   0.10 sec   832.18 M 0.70
25 int32_t for loop unroll 26   0.10 sec   781.11 M 0.75
26 int32_t for loop unroll 27   0.10 sec   792.40 M 0.73
27 int32_t for loop unroll 28   0.10 sec   817.22 M 0.71
28 int32_t for loop unroll 29   0.10 sec   826.40 M 0.70
29 int32_t for loop unroll 30   0.10 sec   803.83 M