[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 --- Comment #13 from wbrana wbrana at gmail dot com 2012-11-30 20:23:40 UTC --- It seems it is caused by 182844 182839 ASSIGNMENT : 64.374 : 244.96 : 63.54 182844 ASSIGNMENT : 57.697 : 219.55 : 56.95 Author: irar irar@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue Jan 3 13:24:04 2012 + PR tree-optimization/51269 * tree-vect-loop-manip.c (set_prologue_iterations): Make first_niters a pointer. (slpeel_tree_peel_loop_to_edge): Likewise. (vect_do_peeling_for_loop_bound): Update call to slpeel_tree_peel_loop_to_edge. (vect_gen_niters_for_prolog_loop): Don't compute wide_prolog_niters here. Remove it from the parameters list. (vect_do_peeling_for_alignment): Update calls and compute wide_prolog_niters.
[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |4.7.3
[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 --- Comment #6 from wbrana wbrana at gmail dot com 2012-11-17 14:24:44 UTC --- Created attachment 28715 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28715 Gentoo patches 1
[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 --- Comment #7 from wbrana wbrana at gmail dot com 2012-11-17 14:25:23 UTC --- Created attachment 28716 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28716 Gentoo patches 2
[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 --- Comment #8 from wbrana wbrana at gmail dot com 2012-11-17 14:26:18 UTC --- Created attachment 28717 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28717 Gentoo patches 3
[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 --- Comment #9 from wbrana wbrana at gmail dot com 2012-11-17 14:29:20 UTC --- Created attachment 28718 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28718 build log from non-broken gcc
[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 --- Comment #10 from wbrana wbrana at gmail dot com 2012-11-17 14:30:22 UTC --- Created attachment 28719 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28719 build log from broken gcc
[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 --- Comment #11 from wbrana wbrana at gmail dot com 2012-11-17 14:52:44 UTC --- It seems I was wrong. Reverting 175752 doesn't fix performance. I used also Gentoo patches with patch which reverts 175752. I thought that it isn't possible, but it seems some of Gentoo patches fixes performance. Any idea which? CPU Sandy Bridge CFLAGS = -fomit-frame-pointer -Wall -O3 -funroll-loops -g0 -march=native -ffast-math -fno-PIE -fno-exceptions -fno-stack-protector -static There is almost no difference in run time between Gentoo patched and vanilla gcc with self-contained testcase.
[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 --- Comment #12 from wbrana wbrana at gmail dot com 2012-11-17 15:01:34 UTC --- more exact CFLAGS -fomit-frame-pointer -Wall -O3 -funroll-loops -g0 -march=corei7 -ffast-math -fno-PIE -fno-exceptions -fno-stack-protector -static
[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-16 18:28:30 UTC --- Created attachment 28712 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28712 assign.c Assignment extracted into a self-contained testcase, does this also make a similar difference for you? On which CPU? Yes, there is a code generation difference with that commit, in *.optimized the difference seems to be (-vanilla, + with Kai's patch reverted): @@ -192,13 +192,12 @@ Assignment (long int[101] * arraybase) sizetype _302; unsigned long _303; sizetype _306; long unsigned int pretmp_307; long unsigned int pretmp_308; long int[101] * pretmp_318; - unsigned long _322; short unsigned int ivtmp_334; unsigned long _350; unsigned int _351; long unsigned int patt_353; short unsigned int _354; unsigned long _355; @@ -286,27 +285,26 @@ Assignment (long int[101] * arraybase) bb 5: # currentmin_72 = PHI currentmin_402(4) _356 = ivtmp.312_453 15; _350 = _356 3; _355 = -_350; _354 = (short unsigned int) _355; - _322 = _355 1; - prolog_loop_niters.10_359 = (short unsigned int) _322; + prolog_loop_niters.10_359 = _354 1; if (prolog_loop_niters.10_359 == 0) goto bb 7; else goto bb 6; bb 6: _272 = MEM[base: pretmp_395, offset: 0B]; _256 = _272 - currentmin_72; MEM[base: pretmp_395, offset: 0B] = _256; bb 7: # j_269 = PHI 1(6), 0(5) - prolog_loop_adjusted_niters.11_124 = _355 1; + prolog_loop_adjusted_niters.11_124 = (sizetype) prolog_loop_niters.10_359; niters.12_129 = 101 - prolog_loop_niters.10_359; base_off.19_523 = prolog_loop_adjusted_niters.11_124 * 8; vect_p.20_524 = pretmp_395 + base_off.19_523; vect_cst_.23_528 = {currentmin_72, currentmin_72}; bb 8: This change happens very late (forwprop4) and nothing afterwards cleans it up (there is no DCE etc. that would DCE the dead assignment to _354 and there is no PRE/FRE to replace _355 1 in the second case with _322. Still just zero-extending _359 is perhaps cheaper register pressure-wise. That said, I can't find any measurable differences between the two.
[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 --- Comment #2 from wbrana wbrana at gmail dot com 2012-11-15 16:12:57 UTC --- Created attachment 28699 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28699 function Assignment without 175752
[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 --- Comment #3 from wbrana wbrana at gmail dot com 2012-11-15 16:16:05 UTC --- Created attachment 28700 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28700 function Assignment with 175752 according to gprof Assignment is called 1574 times without 175752 1449 times with 175752
[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 --- Comment #4 from wbrana wbrana at gmail dot com 2012-11-15 17:01:22 UTC --- Bytemark source code http://www.tux.org/~mayer/linux/nbench-byte-2.2.3.tar.gz
[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286 --- Comment #1 from Mikael Pettersson mikpe at it dot uu.se 2012-11-12 13:44:32 UTC --- r175752 is a follow-up fix to r175589, so my guess is that it's the combination of the two that's causing the regression. Can you construct a small test case that demonstrates the code quality regression from these two revisions?