[Bug rtl-optimization/53533] [4.8/4.9/5/6 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|4.8.5 |4.9.3 --- Comment #30 from Richard Biener rguenth at gcc dot gnu.org --- The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
[Bug rtl-optimization/53533] [4.8/4.9/5/6 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533 Mikhail Maltsev maltsevm at gmail dot com changed: What|Removed |Added CC||maltsevm at gmail dot com --- Comment #28 from Mikhail Maltsev maltsevm at gmail dot com --- Created attachment 35455 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35455action=edit testcase, inlining This testcase marks some functions with __attribute__((always_inline/noinline)) when -DINLINE_MANUALLY is defined.
[Bug rtl-optimization/53533] [4.8/4.9/5/6 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533 --- Comment #29 from Mikhail Maltsev maltsevm at gmail dot com --- Results for attached testcase: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz (Haswell) g++ -O3 -march=native -mtune=native 1 iterations Clang 3.7 Total absolute time for int32_t for loop unrolling: 0.99 sec Total absolute time for int32_t do loop unrolling: 1.00 sec Total absolute time for double for loop unrolling: 1.37 sec Total absolute time for double do loop unrolling: 1.37 sec GCC 4.7.4 Total absolute time for int32_t for loop unrolling: 5.88 sec Total absolute time for int32_t do loop unrolling: 7.57 sec Total absolute time for double for loop unrolling: 2.29 sec Total absolute time for double do loop unrolling: 2.45 sec GCC 4.8.4 Total absolute time for int32_t for loop unrolling: 3.12 sec Total absolute time for int32_t do loop unrolling: 3.29 sec Total absolute time for double for loop unrolling: 1.13 sec Total absolute time for double do loop unrolling: 1.14 sec GCC 4.9.2 Total absolute time for int32_t for loop unrolling: 3.02 sec Total absolute time for int32_t do loop unrolling: 3.29 sec Total absolute time for double for loop unrolling: 1.10 sec Total absolute time for double do loop unrolling: 1.13 sec GCC 6 Total absolute time for int32_t for loop unrolling: 5.95 sec Total absolute time for int32_t do loop unrolling: 6.95 sec Total absolute time for double for loop unrolling: 2.39 sec Total absolute time for double do loop unrolling: 2.39 sec g++ -DINLINE_MANUALLY -O3 -march=native -mtune=native 5 iterations Clang 3.7 Total absolute time for int32_t for loop unrolling: 2.43 sec Total absolute time for int32_t do loop unrolling: 2.32 sec Total absolute time for double for loop unrolling: 6.38 sec Total absolute time for double do loop unrolling: 6.38 sec GCC 4.9.2 Total absolute time for int32_t for loop unrolling: 10.17 sec Total absolute time for int32_t do loop unrolling: 10.16 sec Total absolute time for double for loop unrolling: 3.89 sec Total absolute time for double do loop unrolling: 3.90 sec GCC 6 Total absolute time for int32_t for loop unrolling: 10.10 sec Total absolute time for int32_t do loop unrolling: 10.12 sec Total absolute time for double for loop unrolling: 3.90 sec Total absolute time for double do loop unrolling: 3.89 sec g++ -DINLINE_MANUALLY -Ofast -march=native -mtune=native GCC 6 Total absolute time for int32_t for loop unrolling: 10.11 sec Total absolute time for int32_t do loop unrolling: 10.11 sec Total absolute time for double for loop unrolling: 1.14 sec Total absolute time for double do loop unrolling: 1.15 sec So, IMHO there is no regression here (at least w.r.t. vectorization). Floating point loop gets constant-folded, if reassociation is allowed. Also, GCC6 is able to infer that for and while tests are semantically equivalent and unifies them.
[Bug rtl-optimization/53533] [4.8/4.9/5/6 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533 --- Comment #27 from Markus Trippelsdorf trippels at gcc dot gnu.org --- Created attachment 35448 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35448action=edit testcase
[Bug rtl-optimization/53533] [4.8/4.9/5/6 regression] vectorization causes loop unrolling test slowdown as measured by Adobe's C++Benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53533 Markus Trippelsdorf trippels at gcc dot gnu.org changed: What|Removed |Added Last reconfirmed|2012-05-31 00:00:00 |2015-5-3 CC||trippels at gcc dot gnu.org --- Comment #26 from Markus Trippelsdorf trippels at gcc dot gnu.org --- For gcc-5 and gcc-6 there is an additional 50% slowdown: % g++ -O3 loop_unroll.ii -o loop_unroll % time ./loop_unroll 1 ./loop_unroll 1 testdescription absolute operations ratio with numbertime per second test0 0 int32_t for loop unroll 1 0.14 sec 552.30 M 1.00 1 int32_t for loop unroll 2 0.11 sec 699.49 M 0.79 2 int32_t for loop unroll 3 0.14 sec 566.56 M 0.97 3 int32_t for loop unroll 4 0.15 sec 532.87 M 1.04 4 int32_t for loop unroll 5 0.10 sec 784.70 M 0.70 5 int32_t for loop unroll 6 0.09 sec 887.12 M 0.62 6 int32_t for loop unroll 7 0.09 sec 913.50 M 0.60 7 int32_t for loop unroll 8 0.08 sec 986.45 M 0.56 8 int32_t for loop unroll 9 0.23 sec 346.06 M 1.60 9 int32_t for loop unroll 10 0.08 sec 1040.06 M 0.53 10 int32_t for loop unroll 11 0.23 sec 348.02 M 1.59 11 int32_t for loop unroll 12 0.23 sec 353.38 M 1.56 12 int32_t for loop unroll 13 0.24 sec 338.32 M 1.63 13 int32_t for loop unroll 14 0.24 sec 332.32 M 1.66 14 int32_t for loop unroll 15 0.25 sec 321.15 M 1.72 15 int32_t for loop unroll 16 0.25 sec 318.23 M 1.74 16 int32_t for loop unroll 17 0.24 sec 329.43 M 1.68 17 int32_t for loop unroll 18 0.25 sec 321.34 M 1.72 18 int32_t for loop unroll 19 0.25 sec 314.53 M 1.76 19 int32_t for loop unroll 20 0.25 sec 325.33 M 1.70 20 int32_t for loop unroll 21 0.25 sec 323.67 M 1.71 21 int32_t for loop unroll 22 0.25 sec 316.85 M 1.74 22 int32_t for loop unroll 23 0.25 sec 323.51 M 1.71 23 int32_t for loop unroll 24 0.06 sec 1257.94 M 0.44 24 int32_t for loop unroll 25 0.24 sec 327.77 M 1.69 25 int32_t for loop unroll 26 0.06 sec 1310.44 M 0.42 26 int32_t for loop unroll 27 0.07 sec 1072.85 M 0.51 27 int32_t for loop unroll 28 0.28 sec 283.44 M 1.95 28 int32_t for loop unroll 29 0.30 sec 267.96 M 2.06 29 int32_t for loop unroll 30 0.31 sec 258.88 M 2.13 30 int32_t for loop unroll 31 0.06 sec 1337.64 M 0.41 31 int32_t for loop unroll 32 0.06 sec 1315.10 M 0.42 Total absolute time for int32_t for loop unrolling: 5.85 sec ... ./loop_unroll 1 41.43s user 0.00s system 100% cpu 41.426 total == % /usr/x86_64-pc-linux-gnu/gcc-bin/4.9.2/g++ -O3 loop_unroll.ii -o loop_unroll % time ./loop_unroll 1 ./loop_unroll 1 testdescription absolute operations ratio with numbertime per second test0 0 int32_t for loop unroll 1 0.14 sec 582.13 M 1.00 1 int32_t for loop unroll 2 0.13 sec 625.41 M 0.93 2 int32_t for loop unroll 3 0.13 sec 635.76 M 0.92 3 int32_t for loop unroll 4 0.13 sec 625.41 M 0.93 4 int32_t for loop unroll 5 0.12 sec 640.96 M 0.91 5 int32_t for loop unroll 6 0.09 sec 888.11 M 0.66 6 int32_t for loop unroll 7 0.09 sec 900.10 M 0.65 7 int32_t for loop unroll 8 0.10 sec 832.20 M 0.70 8 int32_t for loop unroll 9 0.10 sec 834.22 M 0.70 9 int32_t for loop unroll 10 0.09 sec 902.04 M 0.65 10 int32_t for loop unroll 11 0.10 sec 805.15 M 0.72 11 int32_t for loop unroll 12 0.10 sec 823.27 M 0.71 12 int32_t for loop unroll 13 0.09 sec 860.51 M 0.68 13 int32_t for loop unroll 14 0.11 sec 753.59 M 0.77 14 int32_t for loop unroll 15 0.10 sec 781.96 M 0.74 15 int32_t for loop unroll 16 0.09 sec 858.76 M 0.68 16 int32_t for loop unroll 17 0.09 sec 846.91 M 0.69 17 int32_t for loop unroll 18 0.10 sec 783.19 M 0.74 18 int32_t for loop unroll 19 0.10 sec 794.81 M 0.73 19 int32_t for loop unroll 20 0.10 sec 806.70 M 0.72 20 int32_t for loop unroll 21 0.10 sec 823.82 M 0.71 21 int32_t for loop unroll 22 0.09 sec 851.74 M 0.68 22 int32_t for loop unroll 23 0.10 sec 792.87 M 0.73 23 int32_t for loop unroll 24 0.10 sec 809.32 M 0.72 24 int32_t for loop unroll 25 0.10 sec 832.18 M 0.70 25 int32_t for loop unroll 26 0.10 sec 781.11 M 0.75 26 int32_t for loop unroll 27 0.10 sec 792.40 M 0.73 27 int32_t for loop unroll 28 0.10 sec 817.22 M 0.71 28 int32_t for loop unroll 29 0.10 sec 826.40 M 0.70 29 int32_t for loop unroll 30 0.10 sec 803.83 M