[Bug middle-end/44382] Slow integer multiply
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382 Bill Schmidt changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #13 from Bill Schmidt --- Fixed, then.
[Bug middle-end/44382] Slow integer multiply
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382 --- Comment #12 from Bill Schmidt --- I'd propose that this bug can now be closed. If nobody objects, I'll do that later this week.
[Bug middle-end/44382] Slow integer multiply
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382 acsawdey at gcc dot gnu.org changed: What|Removed |Added CC||acsawdey at gcc dot gnu.org --- Comment #11 from acsawdey at gcc dot gnu.org --- 236043 adds rs6000_reassociation_width() to enable parallel reassociation for ppc64/ppc64le with -mcpu=power8. The width is picked to be a balance between something wide enough to exploit the core resources but not so wide that we generate a lot of excess spills. SPEC 2006 int is pretty neutral but fp sees some gains, largely because lbm gets a big improvement from this.
[Bug middle-end/44382] Slow integer multiply
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382 --- Comment #10 from William J. Schmidt 2011-10-21 14:41:13 UTC --- One more data point. I repeated the experiment using -fsched-pressure. Although this reduced the degradations considerably, the overall results are equivocal. I see a few improvements and a few degradations in the 1-4% range, with the geometric means essentially unchanged. So even if -fsched-pressure were the default, there wouldn't be an overwhelming case for enabling this support on powerpc64.
[Bug middle-end/44382] Slow integer multiply
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382 --- Comment #9 from William J. Schmidt 2011-10-13 17:30:14 UTC --- Just adding some status information well after the fact... We experimented with adding powerpc64 hooks to use the parallel reassociation support from comment #8. We elected not to enable this support because the results for SPEC were negative (quite negative in some cases), due to increased register pressure in loops where spill was already an issue. Our plans at this point are to live with the left-linear association, at least until the spill costs can be mitigated in some fashion. If parallel reassociation had some heuristics that predicted for register pressure (difficult in tree-ssa, I know), it might become practical for us.
[Bug middle-end/44382] Slow integer multiply
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382 --- Comment #8 from hjl at gcc dot gnu.org 2011-09-06 16:42:56 UTC --- Author: hjl Date: Tue Sep 6 16:42:47 2011 New Revision: 178602 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=178602 Log: PR middle-end/44382: Tree reassociation improvement gcc/ 2011-09-06 Enkovich Ilya PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): Likewise. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_uint_mode_1): New default hook. * hooks.c (hook_int_uint_mode_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (swap_ops_for_binary_stmt): Likewise. (rewrite_expr_tree_parallel): Likewise. (rewrite_expr_tree): Refactored. Part of code moved into swap_ops_for_binary_stmt. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. gcc/testsuite/ 2011-09-06 Enkovich Ilya * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/reassoc-24.c trunk/gcc/testsuite/gcc.dg/tree-ssa/reassoc-25.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h trunk/gcc/doc/invoke.texi trunk/gcc/doc/tm.texi trunk/gcc/doc/tm.texi.in trunk/gcc/hooks.c trunk/gcc/hooks.h trunk/gcc/params.def trunk/gcc/target.def trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/pr38533.c trunk/gcc/tree-ssa-reassoc.c
[Bug middle-end/44382] Slow integer multiply
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382 William J. Schmidt changed: What|Removed |Added CC||wschmidt at gcc dot gnu.org --- Comment #7 from William J. Schmidt 2011-07-12 17:33:06 UTC --- The test case from bug 45671 is as follows: int myfunction (int a, int b, int c, int d, int e, int f, int g, int h) { int ret; ret = a + b + c + d + e + f + g + h; return ret; } Compiling with -O3 results in a series of dependent add instructions to accumulate the sum. add 4,3,4 add 4,4,5 add 4,4,6 add 4,4,7 add 4,4,8 add 4,4,9 add 4,4,10 If we regrouped to (a+b)+(c+d)+... we can do multiple adds in parallel on different execution units.
[Bug middle-end/44382] Slow integer multiply
--- Comment #6 from hjl dot tools at gmail dot com 2010-09-15 04:29 --- *** Bug 45671 has been marked as a duplicate of this bug. *** -- hjl dot tools at gmail dot com changed: What|Removed |Added CC||pthaugen at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382
[Bug middle-end/44382] Slow integer multiply
--- Comment #5 from hjl dot tools at gmail dot com 2010-06-04 14:40 --- tree-ssa-reassoc.c has 2. Left linearization of the expression trees, so that (A+B)+(C+D) becomes (((A+B)+C)+D), which is easier for us to rewrite later. During linearization, we place the operands of the binary expressions into a vector of operand_entry_t I think this may always generate slower codes. We may not want to use much more registers. We can limit us to 2 temporaries. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382
[Bug middle-end/44382] Slow integer multiply
--- Comment #4 from hjl dot tools at gmail dot com 2010-06-04 13:56 --- (In reply to comment #3) > Yes, reassoc linearizes instead of building a tree (saves one (or was it two?) > registers at best). > Should we always build a tree? It may increase register pressure. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382
[Bug middle-end/44382] Slow integer multiply
--- Comment #3 from rguenth at gcc dot gnu dot org 2010-06-04 13:21 --- Yes, reassoc linearizes instead of building a tree (saves one (or was it two?) registers at best). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382
[Bug middle-end/44382] Slow integer multiply
--- Comment #2 from hjl dot tools at gmail dot com 2010-06-04 13:08 --- (In reply to comment #1) > Because our tree reassoc doesn't re-associate them. > The tree reassoc pass makes it slower: [...@gnu-6 44382]$ cat x.i extern int a, b, c, d, e, f; void foo () { a = (b * c) * (d * e); } [...@gnu-6 44382]$ gcc -S -O2 x.i [...@gnu-6 44382]$ cat x.s .file "x.i" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc movlc(%rip), %eax imull b(%rip), %eax imull d(%rip), %eax imull e(%rip), %eax movl%eax, a(%rip) ret [...@gnu-6 44382]$ gcc -S -O2 x.i -fno-tree-reassoc [...@gnu-6 44382]$ cat x.s .file "x.i" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc movlb(%rip), %eax movld(%rip), %edx imull c(%rip), %eax imull e(%rip), %edx imull %edx, %eax movl%eax, a(%rip) ret [...@gnu-6 44382]$ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382
[Bug middle-end/44382] Slow integer multiply
--- Comment #1 from rguenth at gcc dot gnu dot org 2010-06-02 15:15 --- Because our tree reassoc doesn't re-associate them. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Keywords||missed-optimization Last reconfirmed|-00-00 00:00:00 |2010-06-02 15:15:06 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382