[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-30 Thread wbrana at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



--- Comment #13 from wbrana wbrana at gmail dot com 2012-11-30 20:23:40 UTC 
---

It seems it is caused by 182844



182839 

ASSIGNMENT  :  64.374  : 244.96  :  63.54

182844

ASSIGNMENT  :  57.697  : 219.55  :  56.95



Author: irar irar@138bc75d-0d04-0410-961f-82ee72b054a4

Date:   Tue Jan 3 13:24:04 2012 +



PR tree-optimization/51269

* tree-vect-loop-manip.c (set_prologue_iterations): Make

first_niters a pointer.

(slpeel_tree_peel_loop_to_edge): Likewise.

(vect_do_peeling_for_loop_bound): Update call to

slpeel_tree_peel_loop_to_edge.

(vect_gen_niters_for_prolog_loop): Don't compute

wide_prolog_niters here.  Remove it from the parameters list.

(vect_do_peeling_for_alignment): Update calls and compute

wide_prolog_niters.


[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-25 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



Richard Biener rguenth at gcc dot gnu.org changed:



   What|Removed |Added



   Target Milestone|--- |4.7.3


[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-17 Thread wbrana at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



--- Comment #6 from wbrana wbrana at gmail dot com 2012-11-17 14:24:44 UTC ---

Created attachment 28715

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28715

Gentoo patches 1


[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-17 Thread wbrana at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



--- Comment #7 from wbrana wbrana at gmail dot com 2012-11-17 14:25:23 UTC ---

Created attachment 28716

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28716

Gentoo patches 2


[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-17 Thread wbrana at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



--- Comment #8 from wbrana wbrana at gmail dot com 2012-11-17 14:26:18 UTC ---

Created attachment 28717

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28717

Gentoo patches 3


[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-17 Thread wbrana at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



--- Comment #9 from wbrana wbrana at gmail dot com 2012-11-17 14:29:20 UTC ---

Created attachment 28718

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28718

build log from non-broken gcc


[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-17 Thread wbrana at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



--- Comment #10 from wbrana wbrana at gmail dot com 2012-11-17 14:30:22 UTC 
---

Created attachment 28719

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28719

build log from broken gcc


[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-17 Thread wbrana at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



--- Comment #11 from wbrana wbrana at gmail dot com 2012-11-17 14:52:44 UTC 
---

It seems I was wrong. Reverting 175752 doesn't fix performance.

I used also Gentoo patches with patch which reverts 175752. 

I thought that it isn't possible, but it seems some of Gentoo patches fixes

performance. Any idea which?



CPU Sandy Bridge

CFLAGS = -fomit-frame-pointer -Wall -O3 -funroll-loops -g0  -march=native

-ffast-math -fno-PIE -fno-exceptions -fno-stack-protector -static



There is almost no difference in run time between Gentoo patched and vanilla

gcc with self-contained testcase.


[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-17 Thread wbrana at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



--- Comment #12 from wbrana wbrana at gmail dot com 2012-11-17 15:01:34 UTC 
---

more exact CFLAGS

-fomit-frame-pointer -Wall -O3 -funroll-loops -g0  -march=corei7

-ffast-math -fno-PIE -fno-exceptions -fno-stack-protector -static


[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-16 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



--- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-16 
18:28:30 UTC ---

Created attachment 28712

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28712

assign.c



Assignment extracted into a self-contained testcase, does this also make a

similar difference for you?  On which CPU?  Yes, there is a code generation

difference with that commit, in *.optimized the difference seems to be

(-vanilla, + with Kai's patch reverted):

@@ -192,13 +192,12 @@ Assignment (long int[101] * arraybase)

   sizetype _302;

   unsigned long _303;

   sizetype _306;

   long unsigned int pretmp_307;

   long unsigned int pretmp_308;

   long int[101] * pretmp_318;

-  unsigned long _322;

   short unsigned int ivtmp_334;

   unsigned long _350;

   unsigned int _351;

   long unsigned int patt_353;

   short unsigned int _354;

   unsigned long _355;

@@ -286,27 +285,26 @@ Assignment (long int[101] * arraybase)

   bb 5:

   # currentmin_72 = PHI currentmin_402(4)

   _356 = ivtmp.312_453  15;

   _350 = _356  3;

   _355 = -_350;

   _354 = (short unsigned int) _355;

-  _322 = _355  1;

-  prolog_loop_niters.10_359 = (short unsigned int) _322;

+  prolog_loop_niters.10_359 = _354  1;

   if (prolog_loop_niters.10_359 == 0)

 goto bb 7;

   else

 goto bb 6;



   bb 6:

   _272 = MEM[base: pretmp_395, offset: 0B];

   _256 = _272 - currentmin_72;

   MEM[base: pretmp_395, offset: 0B] = _256;



   bb 7:

   # j_269 = PHI 1(6), 0(5)

-  prolog_loop_adjusted_niters.11_124 = _355  1;

+  prolog_loop_adjusted_niters.11_124 = (sizetype) prolog_loop_niters.10_359;

   niters.12_129 = 101 - prolog_loop_niters.10_359;

   base_off.19_523 = prolog_loop_adjusted_niters.11_124 * 8;

   vect_p.20_524 = pretmp_395 + base_off.19_523;

   vect_cst_.23_528 = {currentmin_72, currentmin_72};



   bb 8:



This change happens very late (forwprop4) and nothing afterwards cleans it up

(there is no DCE etc. that would DCE the dead assignment to _354 and there is

no PRE/FRE to replace _355  1 in the second case with _322.  Still just

zero-extending _359 is perhaps cheaper register pressure-wise.



That said, I can't find any measurable differences between the two.


[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-15 Thread wbrana at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



--- Comment #2 from wbrana wbrana at gmail dot com 2012-11-15 16:12:57 UTC ---

Created attachment 28699

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28699

function Assignment without 175752


[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-15 Thread wbrana at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



--- Comment #3 from wbrana wbrana at gmail dot com 2012-11-15 16:16:05 UTC ---

Created attachment 28700

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28700

function Assignment with 175752



according to gprof Assignment is called 

1574 times without 175752

1449 times with 175752


[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-15 Thread wbrana at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



--- Comment #4 from wbrana wbrana at gmail dot com 2012-11-15 17:01:22 UTC ---

Bytemark source code

http://www.tux.org/~mayer/linux/nbench-byte-2.2.3.tar.gz


[Bug tree-optimization/55286] [4.7/4.8 Regression] Bytemark ASSIGNMENT 4% - 10% slower

2012-11-12 Thread mikpe at it dot uu.se


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55286



--- Comment #1 from Mikael Pettersson mikpe at it dot uu.se 2012-11-12 
13:44:32 UTC ---

r175752 is a follow-up fix to r175589, so my guess is that it's the combination

of the two that's causing the regression.



Can you construct a small test case that demonstrates the code quality

regression from these two revisions?