[Bug tree-optimization/56175] Issue with combine phase on x86.

2013-02-21 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175



Richard Biener rguenth at gcc dot gnu.org changed:



   What|Removed |Added



 CC||rguenth at gcc dot gnu.org



--- Comment #12 from Richard Biener rguenth at gcc dot gnu.org 2013-02-21 
13:44:09 UTC ---

For the real testcase I see



.L2:

shrw%ax

movl%edi, %edx

subb$1, %dl

movl%edx, %edi

je  .L9

.L4:

movl%ecx, %esi

movl%eax, %ebx

andl$1, %esi

andl$1, %ebx

shrb%cl

movl%esi, %edx

cmpb%bl, %dl

je  .L2



thus



andl$1, %esi

andl$1, %ebx

cmpb%bl, %dl



for



   t = (u8)((x  1) ^ ((u8)y  1));

   if (t == 1)



and with disabling the forwprop transformation:



.L2:

shrw%ax

subb$1, %dl

je  .L9

.L4:

movl%ecx, %ebx

shrb%cl

xorl%eax, %ebx

andl$1, %ebx

je  .L2



to confirm the issue again.  There is one less used register and the

zero-flag use by the conditional jump.



The following testcase is too simple to be not optimized anyway

at the RTL level but it may serve as a testcase for forwprop.



void bar (void);

unsigned short

foo (unsigned char x, unsigned short y)

{

  unsigned char t = (unsigned char)((x  1) ^ ((unsigned char)y  1));

  if (t == 1)

bar ();

  return y;

}


[Bug tree-optimization/56175] Issue with combine phase on x86.

2013-02-14 Thread ysrumyan at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175



--- Comment #11 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-14 
12:03:37 UTC ---

I did measurements of 3 possible fixes:



1. Comment out 2 patterns related to type sinking.

2. Comment out 1st pattern only.

3. Prohibit type sinking if source type (of def_arg1) is short type.



Measuremnets were done on eembc 2.0 suite at base optset and they showed that

the 3rd fix is more profitable for x86 in 32-bit mode.



Since I hear nothing from the code owner I assume that we will add new target

hook returning true/false for type sinkning in the both patterns that will

anaylze the source type and likely destination type of operand.



Richard, what is your opinion?


[Bug tree-optimization/56175] Issue with combine phase on x86.

2013-02-12 Thread ysrumyan at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175



--- Comment #7 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-12 
13:05:16 UTC ---

(In reply to comment #6)

 (In reply to comment #5)

  This pattern is already recognized by simplify_bitwise_binary but only for

  usual int type, i.e. if we change all short types to the ordinary int (or

  unsigned) this simplification takes place (dump after 1st forwprop):

  

bb 4:

x_8 = x_2(D)  1;

y_9 = y_4(D)  1;

_10 = x_8  1;

_11 = y_9  1;

_16 = x_8 ^ y_9;

z_12 = _16  1;

  

  i.e. the issue is redundant type conversions:

  

bb 3:

x_7 = x_2(D)  1;

y_8 = y_4(D)  1;

_13 = x_7  1;

_9 = (signed char) _13;

_14 = y_8  1;

_10 = (signed char) _14;

_11 = _9 ^ _10;

  

  I assume that if we delete these redundant conversions the required

  simplification will happen.

 

 Ah, well.  The issue is that we transformed (unsigned char)y  1

 to (unsigned char)(y  1).



Hi Richard,



We'd like to fix this issue since we can get +10.5% speedup on Atom.

What is your opinion on how better to fix this issue with 1st pattern in

simplify_bitwise_binary?



I have no idea why gcc does such transformation and what gain we can get from

it - decrease size of constant or create more opportunities for cse?

I can propose the following possible changes:



1. Introduce a hook for doing such transformation.

2. Introduce a new forwprop pass that does not do such transformation.

3. Do not perform such transformation for small positive constant.

4. Do not performa such transformation if (type-x) c == c.

etc.



Any help will be appreciated.

Yuri.


[Bug tree-optimization/56175] Issue with combine phase on x86.

2013-02-12 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175



--- Comment #8 from Richard Biener rguenth at gcc dot gnu.org 2013-02-12 
13:25:59 UTC ---

(In reply to comment #7)

 (In reply to comment #6)

  (In reply to comment #5)

   This pattern is already recognized by simplify_bitwise_binary but only for

   usual int type, i.e. if we change all short types to the ordinary int (or

   unsigned) this simplification takes place (dump after 1st forwprop):

   

 bb 4:

 x_8 = x_2(D)  1;

 y_9 = y_4(D)  1;

 _10 = x_8  1;

 _11 = y_9  1;

 _16 = x_8 ^ y_9;

 z_12 = _16  1;

   

   i.e. the issue is redundant type conversions:

   

 bb 3:

 x_7 = x_2(D)  1;

 y_8 = y_4(D)  1;

 _13 = x_7  1;

 _9 = (signed char) _13;

 _14 = y_8  1;

 _10 = (signed char) _14;

 _11 = _9 ^ _10;

   

   I assume that if we delete these redundant conversions the required

   simplification will happen.

  

  Ah, well.  The issue is that we transformed (unsigned char)y  1

  to (unsigned char)(y  1).

 

 Hi Richard,

 

 We'd like to fix this issue since we can get +10.5% speedup on Atom.

 What is your opinion on how better to fix this issue with 1st pattern in

 simplify_bitwise_binary?

 

 I have no idea why gcc does such transformation and what gain we can get from

 it - decrease size of constant or create more opportunities for cse?



Well, you'd have to track down what is responsible for that transform.



Generally promoting operations (and automatic vars) to word-mode may

be beneficial on most targets.  But that should be done late.



 I can propose the following possible changes:

 

 1. Introduce a hook for doing such transformation.

 2. Introduce a new forwprop pass that does not do such transformation.

 3. Do not perform such transformation for small positive constant.

 4. Do not performa such transformation if (type-x) c == c.

 etc.



First track it down ;)



 Any help will be appreciated.

 Yuri.


[Bug tree-optimization/56175] Issue with combine phase on x86.

2013-02-12 Thread ysrumyan at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175



--- Comment #9 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-12 
14:43:53 UTC ---

(In reply to comment #8)

 (In reply to comment #7)

  (In reply to comment #6)

   (In reply to comment #5)

This pattern is already recognized by simplify_bitwise_binary but only 
for

usual int type, i.e. if we change all short types to the ordinary int 
(or

unsigned) this simplification takes place (dump after 1st forwprop):



  bb 4:

  x_8 = x_2(D)  1;

  y_9 = y_4(D)  1;

  _10 = x_8  1;

  _11 = y_9  1;

  _16 = x_8 ^ y_9;

  z_12 = _16  1;



i.e. the issue is redundant type conversions:



  bb 3:

  x_7 = x_2(D)  1;

  y_8 = y_4(D)  1;

  _13 = x_7  1;

  _9 = (signed char) _13;

  _14 = y_8  1;

  _10 = (signed char) _14;

  _11 = _9 ^ _10;



I assume that if we delete these redundant conversions the required

simplification will happen.

   

   Ah, well.  The issue is that we transformed (unsigned char)y  1

   to (unsigned char)(y  1).

  

  Hi Richard,

  

  We'd like to fix this issue since we can get +10.5% speedup on Atom.

  What is your opinion on how better to fix this issue with 1st pattern in

  simplify_bitwise_binary?

  

  I have no idea why gcc does such transformation and what gain we can get 
  from

  it - decrease size of constant or create more opportunities for cse?

 

 Well, you'd have to track down what is responsible for that transform.

 

 Generally promoting operations (and automatic vars) to word-mode may

 be beneficial on most targets.  But that should be done late.

 

  I can propose the following possible changes:

  

  1. Introduce a hook for doing such transformation.

  2. Introduce a new forwprop pass that does not do such transformation.

  3. Do not perform such transformation for small positive constant.

  4. Do not performa such transformation if (type-x) c == c.

  etc.

 

 First track it down ;)

 

  Any help will be appreciated.

  Yuri.



Richard,



I am familiar with type promotion transformation that e.g. can transform byte

loop counter to word, but this is done by another phases, e.g. lto.



We found out the owner of this change



http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html 



What our next steps?



Thanks ahead.

Yuri.


[Bug tree-optimization/56175] Issue with combine phase on x86.

2013-02-12 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175



Jakub Jelinek jakub at gcc dot gnu.org changed:



   What|Removed |Added



 CC||jakub at gcc dot gnu.org,

   ||ktietz at gcc dot gnu.org



--- Comment #10 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-12 
14:46:50 UTC ---

For 4.9, Kai is working on type promotion/demotion GIMPLE pass(es), so when

discussing that change this can be also taken into account.


[Bug tree-optimization/56175] Issue with combine phase on x86.

2013-02-11 Thread ysrumyan at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175



--- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-11 
13:42:49 UTC ---

This pattern is already recognized by simplify_bitwise_binary but only for

usual int type, i.e. if we change all short types to the ordinary int (or

unsigned) this simplification takes place (dump after 1st forwprop):



  bb 4:

  x_8 = x_2(D)  1;

  y_9 = y_4(D)  1;

  _10 = x_8  1;

  _11 = y_9  1;

  _16 = x_8 ^ y_9;

  z_12 = _16  1;



i.e. the issue is redundant type conversions:



  bb 3:

  x_7 = x_2(D)  1;

  y_8 = y_4(D)  1;

  _13 = x_7  1;

  _9 = (signed char) _13;

  _14 = y_8  1;

  _10 = (signed char) _14;

  _11 = _9 ^ _10;



I assume that if we delete these redundant conversions the required

simplification will happen.


[Bug tree-optimization/56175] Issue with combine phase on x86.

2013-02-11 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175



--- Comment #6 from Richard Biener rguenth at gcc dot gnu.org 2013-02-11 
14:38:37 UTC ---

(In reply to comment #5)

 This pattern is already recognized by simplify_bitwise_binary but only for

 usual int type, i.e. if we change all short types to the ordinary int (or

 unsigned) this simplification takes place (dump after 1st forwprop):

 

   bb 4:

   x_8 = x_2(D)  1;

   y_9 = y_4(D)  1;

   _10 = x_8  1;

   _11 = y_9  1;

   _16 = x_8 ^ y_9;

   z_12 = _16  1;

 

 i.e. the issue is redundant type conversions:

 

   bb 3:

   x_7 = x_2(D)  1;

   y_8 = y_4(D)  1;

   _13 = x_7  1;

   _9 = (signed char) _13;

   _14 = y_8  1;

   _10 = (signed char) _14;

   _11 = _9 ^ _10;

 

 I assume that if we delete these redundant conversions the required

 simplification will happen.



Ah, well.  The issue is that we transformed (unsigned char)y  1

to (unsigned char)(y  1).


[Bug tree-optimization/56175] Issue with combine phase on x86.

2013-02-04 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175



Richard Biener rguenth at gcc dot gnu.org changed:



   What|Removed |Added



   Keywords||missed-optimization

 Status|UNCONFIRMED |NEW

   Last reconfirmed||2013-02-04

  Component|rtl-optimization|tree-optimization

 Ever Confirmed|0   |1

   Severity|normal  |enhancement



--- Comment #4 from Richard Biener rguenth at gcc dot gnu.org 2013-02-04 
10:10:31 UTC ---

This should be fixed on the GIMPLE level by simplify_bitwise_binary.  That is,

(A  C) ^ (B  C) - (A ^ B)  C for all code combinations and C's that this

is valid for.  fold doesn't seem to have this complex pattern.