[Bug tree-optimization/56175] Issue with combine phase on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #12 from Richard Biener rguenth at gcc dot gnu.org 2013-02-21 13:44:09 UTC --- For the real testcase I see .L2: shrw%ax movl%edi, %edx subb$1, %dl movl%edx, %edi je .L9 .L4: movl%ecx, %esi movl%eax, %ebx andl$1, %esi andl$1, %ebx shrb%cl movl%esi, %edx cmpb%bl, %dl je .L2 thus andl$1, %esi andl$1, %ebx cmpb%bl, %dl for t = (u8)((x 1) ^ ((u8)y 1)); if (t == 1) and with disabling the forwprop transformation: .L2: shrw%ax subb$1, %dl je .L9 .L4: movl%ecx, %ebx shrb%cl xorl%eax, %ebx andl$1, %ebx je .L2 to confirm the issue again. There is one less used register and the zero-flag use by the conditional jump. The following testcase is too simple to be not optimized anyway at the RTL level but it may serve as a testcase for forwprop. void bar (void); unsigned short foo (unsigned char x, unsigned short y) { unsigned char t = (unsigned char)((x 1) ^ ((unsigned char)y 1)); if (t == 1) bar (); return y; }
[Bug tree-optimization/56175] Issue with combine phase on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175 --- Comment #11 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-14 12:03:37 UTC --- I did measurements of 3 possible fixes: 1. Comment out 2 patterns related to type sinking. 2. Comment out 1st pattern only. 3. Prohibit type sinking if source type (of def_arg1) is short type. Measuremnets were done on eembc 2.0 suite at base optset and they showed that the 3rd fix is more profitable for x86 in 32-bit mode. Since I hear nothing from the code owner I assume that we will add new target hook returning true/false for type sinkning in the both patterns that will anaylze the source type and likely destination type of operand. Richard, what is your opinion?
[Bug tree-optimization/56175] Issue with combine phase on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175 --- Comment #7 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-12 13:05:16 UTC --- (In reply to comment #6) (In reply to comment #5) This pattern is already recognized by simplify_bitwise_binary but only for usual int type, i.e. if we change all short types to the ordinary int (or unsigned) this simplification takes place (dump after 1st forwprop): bb 4: x_8 = x_2(D) 1; y_9 = y_4(D) 1; _10 = x_8 1; _11 = y_9 1; _16 = x_8 ^ y_9; z_12 = _16 1; i.e. the issue is redundant type conversions: bb 3: x_7 = x_2(D) 1; y_8 = y_4(D) 1; _13 = x_7 1; _9 = (signed char) _13; _14 = y_8 1; _10 = (signed char) _14; _11 = _9 ^ _10; I assume that if we delete these redundant conversions the required simplification will happen. Ah, well. The issue is that we transformed (unsigned char)y 1 to (unsigned char)(y 1). Hi Richard, We'd like to fix this issue since we can get +10.5% speedup on Atom. What is your opinion on how better to fix this issue with 1st pattern in simplify_bitwise_binary? I have no idea why gcc does such transformation and what gain we can get from it - decrease size of constant or create more opportunities for cse? I can propose the following possible changes: 1. Introduce a hook for doing such transformation. 2. Introduce a new forwprop pass that does not do such transformation. 3. Do not perform such transformation for small positive constant. 4. Do not performa such transformation if (type-x) c == c. etc. Any help will be appreciated. Yuri.
[Bug tree-optimization/56175] Issue with combine phase on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175 --- Comment #8 from Richard Biener rguenth at gcc dot gnu.org 2013-02-12 13:25:59 UTC --- (In reply to comment #7) (In reply to comment #6) (In reply to comment #5) This pattern is already recognized by simplify_bitwise_binary but only for usual int type, i.e. if we change all short types to the ordinary int (or unsigned) this simplification takes place (dump after 1st forwprop): bb 4: x_8 = x_2(D) 1; y_9 = y_4(D) 1; _10 = x_8 1; _11 = y_9 1; _16 = x_8 ^ y_9; z_12 = _16 1; i.e. the issue is redundant type conversions: bb 3: x_7 = x_2(D) 1; y_8 = y_4(D) 1; _13 = x_7 1; _9 = (signed char) _13; _14 = y_8 1; _10 = (signed char) _14; _11 = _9 ^ _10; I assume that if we delete these redundant conversions the required simplification will happen. Ah, well. The issue is that we transformed (unsigned char)y 1 to (unsigned char)(y 1). Hi Richard, We'd like to fix this issue since we can get +10.5% speedup on Atom. What is your opinion on how better to fix this issue with 1st pattern in simplify_bitwise_binary? I have no idea why gcc does such transformation and what gain we can get from it - decrease size of constant or create more opportunities for cse? Well, you'd have to track down what is responsible for that transform. Generally promoting operations (and automatic vars) to word-mode may be beneficial on most targets. But that should be done late. I can propose the following possible changes: 1. Introduce a hook for doing such transformation. 2. Introduce a new forwprop pass that does not do such transformation. 3. Do not perform such transformation for small positive constant. 4. Do not performa such transformation if (type-x) c == c. etc. First track it down ;) Any help will be appreciated. Yuri.
[Bug tree-optimization/56175] Issue with combine phase on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175 --- Comment #9 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-12 14:43:53 UTC --- (In reply to comment #8) (In reply to comment #7) (In reply to comment #6) (In reply to comment #5) This pattern is already recognized by simplify_bitwise_binary but only for usual int type, i.e. if we change all short types to the ordinary int (or unsigned) this simplification takes place (dump after 1st forwprop): bb 4: x_8 = x_2(D) 1; y_9 = y_4(D) 1; _10 = x_8 1; _11 = y_9 1; _16 = x_8 ^ y_9; z_12 = _16 1; i.e. the issue is redundant type conversions: bb 3: x_7 = x_2(D) 1; y_8 = y_4(D) 1; _13 = x_7 1; _9 = (signed char) _13; _14 = y_8 1; _10 = (signed char) _14; _11 = _9 ^ _10; I assume that if we delete these redundant conversions the required simplification will happen. Ah, well. The issue is that we transformed (unsigned char)y 1 to (unsigned char)(y 1). Hi Richard, We'd like to fix this issue since we can get +10.5% speedup on Atom. What is your opinion on how better to fix this issue with 1st pattern in simplify_bitwise_binary? I have no idea why gcc does such transformation and what gain we can get from it - decrease size of constant or create more opportunities for cse? Well, you'd have to track down what is responsible for that transform. Generally promoting operations (and automatic vars) to word-mode may be beneficial on most targets. But that should be done late. I can propose the following possible changes: 1. Introduce a hook for doing such transformation. 2. Introduce a new forwprop pass that does not do such transformation. 3. Do not perform such transformation for small positive constant. 4. Do not performa such transformation if (type-x) c == c. etc. First track it down ;) Any help will be appreciated. Yuri. Richard, I am familiar with type promotion transformation that e.g. can transform byte loop counter to word, but this is done by another phases, e.g. lto. We found out the owner of this change http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html What our next steps? Thanks ahead. Yuri.
[Bug tree-optimization/56175] Issue with combine phase on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||ktietz at gcc dot gnu.org --- Comment #10 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-12 14:46:50 UTC --- For 4.9, Kai is working on type promotion/demotion GIMPLE pass(es), so when discussing that change this can be also taken into account.
[Bug tree-optimization/56175] Issue with combine phase on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175 --- Comment #5 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-02-11 13:42:49 UTC --- This pattern is already recognized by simplify_bitwise_binary but only for usual int type, i.e. if we change all short types to the ordinary int (or unsigned) this simplification takes place (dump after 1st forwprop): bb 4: x_8 = x_2(D) 1; y_9 = y_4(D) 1; _10 = x_8 1; _11 = y_9 1; _16 = x_8 ^ y_9; z_12 = _16 1; i.e. the issue is redundant type conversions: bb 3: x_7 = x_2(D) 1; y_8 = y_4(D) 1; _13 = x_7 1; _9 = (signed char) _13; _14 = y_8 1; _10 = (signed char) _14; _11 = _9 ^ _10; I assume that if we delete these redundant conversions the required simplification will happen.
[Bug tree-optimization/56175] Issue with combine phase on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175 --- Comment #6 from Richard Biener rguenth at gcc dot gnu.org 2013-02-11 14:38:37 UTC --- (In reply to comment #5) This pattern is already recognized by simplify_bitwise_binary but only for usual int type, i.e. if we change all short types to the ordinary int (or unsigned) this simplification takes place (dump after 1st forwprop): bb 4: x_8 = x_2(D) 1; y_9 = y_4(D) 1; _10 = x_8 1; _11 = y_9 1; _16 = x_8 ^ y_9; z_12 = _16 1; i.e. the issue is redundant type conversions: bb 3: x_7 = x_2(D) 1; y_8 = y_4(D) 1; _13 = x_7 1; _9 = (signed char) _13; _14 = y_8 1; _10 = (signed char) _14; _11 = _9 ^ _10; I assume that if we delete these redundant conversions the required simplification will happen. Ah, well. The issue is that we transformed (unsigned char)y 1 to (unsigned char)(y 1).
[Bug tree-optimization/56175] Issue with combine phase on x86.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2013-02-04 Component|rtl-optimization|tree-optimization Ever Confirmed|0 |1 Severity|normal |enhancement --- Comment #4 from Richard Biener rguenth at gcc dot gnu.org 2013-02-04 10:10:31 UTC --- This should be fixed on the GIMPLE level by simplify_bitwise_binary. That is, (A C) ^ (B C) - (A ^ B) C for all code combinations and C's that this is valid for. fold doesn't seem to have this complex pattern.