Re: [PATCH] Fix PR18589
On Thu, Apr 5, 2012 at 3:49 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote: On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Unfortunately this seems to be necessary if I name the two passes reassoc1 and reassoc2. If I try to name both of them reassoc I get failures in other tests like gfortran.dg/reassoc_4, where -fdump-tree-reassoc1 doesn't work. Unless I'm missing something obvious, I think I need to keep that change. Hm, naming them reassoc1 and reassoc2 is a hack. Naming both reassoc will not trigger re-naming them to reassoc1 and reassoc2 I think. How ugly. Especially that -fdump-tree-reassoc will no longer work. Maybe instead of using two pass structs resort to using the existing hack with using first_pass_instance and TODO_mark_first_instance. OK, that seems to be the best among evils. Using the first_pass_instance hack, the patch is transformed as below. Regstrapped on powerpc64-linux, no additional failures. OK for trunk? Ok. Thanks, Richard. Thanks, Bill gcc: 2012-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (operand_entry): Add count field. (add_repeat_to_ops_vec): New function. (completely_remove_stmt): Likewise. (remove_def_if_absorbed_call): Likewise. (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. (acceptable_pow_call): New function. (linearize_expr_tree): Look for builtin pow/powi calls and add operand entries with repeat counts when found. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Call attempt_builtin_powi. (fini_reassoc): Two new calls to statistics_counter_event. gcc/testsuite: 2012-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * gcc.dg/tree-ssa/pr18589-1.c: New test. * gcc.dg/tree-ssa/pr18589-2.c: Likewise. * gcc.dg/tree-ssa/pr18589-3.c: Likewise. * gcc.dg/tree-ssa/pr18589-4.c: Likewise. * gcc.dg/tree-ssa/pr18589-5.c: Likewise. * gcc.dg/tree-ssa/pr18589-6.c: Likewise. * gcc.dg/tree-ssa/pr18589-7.c: Likewise. * gcc.dg/tree-ssa/pr18589-8.c: Likewise. * gcc.dg/tree-ssa/pr18589-9.c: Likewise. * gcc.dg/tree-ssa/pr18589-10.c: Likewise. Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +double baz (double x, double y, double z, double u) +{ + return x * x * y * y * y * z * z * z * z * u; +} + +/* { dg-final { scan-tree-dump-times \\* 6 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +double baz (double x, double y, double z, double u) +{ + return x * x * x * y * y * y * z * z * z * z * u * u * u * u; +} + +/* { dg-final { scan-tree-dump-times \\* 6 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +double baz (double x, double y) +{ + return __builtin_pow (x, 3.0) * __builtin_pow (y, 4.0); +} + +/* { dg-final { scan-tree-dump-times \\* 4 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +float baz (float x, float y) +{ + return x * x * x * x * y * y * y * y; +} + +/* { dg-final { scan-tree-dump-times \\* 3 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized
Re: [PATCH] Fix PR18589
On Thu, Apr 5, 2012 at 6:49 AM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote: On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Unfortunately this seems to be necessary if I name the two passes reassoc1 and reassoc2. If I try to name both of them reassoc I get failures in other tests like gfortran.dg/reassoc_4, where -fdump-tree-reassoc1 doesn't work. Unless I'm missing something obvious, I think I need to keep that change. Hm, naming them reassoc1 and reassoc2 is a hack. Naming both reassoc will not trigger re-naming them to reassoc1 and reassoc2 I think. How ugly. Especially that -fdump-tree-reassoc will no longer work. Maybe instead of using two pass structs resort to using the existing hack with using first_pass_instance and TODO_mark_first_instance. OK, that seems to be the best among evils. Using the first_pass_instance hack, the patch is transformed as below. Regstrapped on powerpc64-linux, no additional failures. OK for trunk? Thanks, Bill gcc: 2012-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (operand_entry): Add count field. (add_repeat_to_ops_vec): New function. (completely_remove_stmt): Likewise. (remove_def_if_absorbed_call): Likewise. (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. (acceptable_pow_call): New function. (linearize_expr_tree): Look for builtin pow/powi calls and add operand entries with repeat counts when found. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Call attempt_builtin_powi. (fini_reassoc): Two new calls to statistics_counter_event. It breaks bootstrap on Linux/ia32: ../../src-trunk/gcc/tree-ssa-reassoc.c: In function 'void attempt_builtin_powi(gimple, VEC_operand_entry_t_heap**, tree_node**)': ../../src-trunk/gcc/tree-ssa-reassoc.c:3189:41: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, )^%ld\n, power); ^ ../../src-trunk/gcc/tree-ssa-reassoc.c:3222:44: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, )^%ld\n, power); ^ cc1plus: all warnings being treated as errors H.J.
Re: [PATCH] Fix PR18589
On Thu, 2012-04-12 at 09:50 -0700, H.J. Lu wrote: On Thu, Apr 5, 2012 at 6:49 AM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote: On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Unfortunately this seems to be necessary if I name the two passes reassoc1 and reassoc2. If I try to name both of them reassoc I get failures in other tests like gfortran.dg/reassoc_4, where -fdump-tree-reassoc1 doesn't work. Unless I'm missing something obvious, I think I need to keep that change. Hm, naming them reassoc1 and reassoc2 is a hack. Naming both reassoc will not trigger re-naming them to reassoc1 and reassoc2 I think. How ugly. Especially that -fdump-tree-reassoc will no longer work. Maybe instead of using two pass structs resort to using the existing hack with using first_pass_instance and TODO_mark_first_instance. OK, that seems to be the best among evils. Using the first_pass_instance hack, the patch is transformed as below. Regstrapped on powerpc64-linux, no additional failures. OK for trunk? Thanks, Bill gcc: 2012-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (operand_entry): Add count field. (add_repeat_to_ops_vec): New function. (completely_remove_stmt): Likewise. (remove_def_if_absorbed_call): Likewise. (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. (acceptable_pow_call): New function. (linearize_expr_tree): Look for builtin pow/powi calls and add operand entries with repeat counts when found. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Call attempt_builtin_powi. (fini_reassoc): Two new calls to statistics_counter_event. It breaks bootstrap on Linux/ia32: ../../src-trunk/gcc/tree-ssa-reassoc.c: In function 'void attempt_builtin_powi(gimple, VEC_operand_entry_t_heap**, tree_node**)': ../../src-trunk/gcc/tree-ssa-reassoc.c:3189:41: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, )^%ld\n, power); ^ ../../src-trunk/gcc/tree-ssa-reassoc.c:3222:44: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, )^%ld\n, power); ^ cc1plus: all warnings being treated as errors H.J. Whoops. Looks like I need to use HOST_WIDE_INT_PRINT_DEC instead of %ld in those spots. I'll get a fix prepared.
Re: [PATCH] Fix PR18589
On Thu, 2012-04-12 at 09:50 -0700, H.J. Lu wrote: On Thu, Apr 5, 2012 at 6:49 AM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote: On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Unfortunately this seems to be necessary if I name the two passes reassoc1 and reassoc2. If I try to name both of them reassoc I get failures in other tests like gfortran.dg/reassoc_4, where -fdump-tree-reassoc1 doesn't work. Unless I'm missing something obvious, I think I need to keep that change. Hm, naming them reassoc1 and reassoc2 is a hack. Naming both reassoc will not trigger re-naming them to reassoc1 and reassoc2 I think. How ugly. Especially that -fdump-tree-reassoc will no longer work. Maybe instead of using two pass structs resort to using the existing hack with using first_pass_instance and TODO_mark_first_instance. OK, that seems to be the best among evils. Using the first_pass_instance hack, the patch is transformed as below. Regstrapped on powerpc64-linux, no additional failures. OK for trunk? Thanks, Bill gcc: 2012-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (operand_entry): Add count field. (add_repeat_to_ops_vec): New function. (completely_remove_stmt): Likewise. (remove_def_if_absorbed_call): Likewise. (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. (acceptable_pow_call): New function. (linearize_expr_tree): Look for builtin pow/powi calls and add operand entries with repeat counts when found. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Call attempt_builtin_powi. (fini_reassoc): Two new calls to statistics_counter_event. It breaks bootstrap on Linux/ia32: ../../src-trunk/gcc/tree-ssa-reassoc.c: In function 'void attempt_builtin_powi(gimple, VEC_operand_entry_t_heap**, tree_node**)': ../../src-trunk/gcc/tree-ssa-reassoc.c:3189:41: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, )^%ld\n, power); ^ ../../src-trunk/gcc/tree-ssa-reassoc.c:3222:44: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, )^%ld\n, power); ^ cc1plus: all warnings being treated as errors H.J. Thanks, H.J. Sorry for the problem! Fixing as follows. I'll plan to commit as obvious shortly. 2012-04-12 Bill Schmidt wschm...@linux.vnet.ibm.com * tree-ssa-reassoc.c (attempt_builtin_powi_stats): Change %ld to HOST_WIDE_INT_PRINT_DEC in format strings. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 186384) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -3186,7 +3186,8 @@ attempt_builtin_powi (gimple stmt, VEC(operand_ent if (elt vec_len - 1) fputs ( * , dump_file); } - fprintf (dump_file, )^%ld\n, power); + fprintf (dump_file, )^HOST_WIDE_INT_PRINT_DEC\n, + power); } } } @@ -3219,7 +3220,7 @@ attempt_builtin_powi (gimple stmt, VEC(operand_ent if (elt vec_len - 1) fputs ( * , dump_file); } - fprintf (dump_file, )^%ld\n, power); + fprintf (dump_file, )^HOST_WIDE_INT_PRINT_DEC\n, power); } reassociate_stats.pows_created++;
Re: [PATCH] Fix PR18589
On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Wed, 2012-04-04 at 15:08 +0200, Richard Guenther wrote: On Wed, Apr 4, 2012 at 2:35 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Wed, 2012-04-04 at 13:35 +0200, Richard Guenther wrote: On Tue, Apr 3, 2012 at 10:25 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Wed, 2012-03-28 at 15:57 +0200, Richard Guenther wrote: On Tue, Mar 6, 2012 at 9:49 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, This is a re-post of the patch I posted for comments in January to address http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18589. The patch modifies reassociation to expose repeated factors from __builtin_pow* calls, optimally reassociate repeated factors, and possibly reconstitute __builtin_powi calls from the results of reassociation. Bootstrapped and passes regression tests for powerpc64-linux-gnu. I expect there may need to be some small changes, but I am targeting this for trunk approval. Thanks very much for the review, Hmm. How much work would it be to extend the reassoc 'IL' to allow a repeat factor per op? I realize what you do is all within what reassoc already does though ideally we would not require any GIMPLE IL changes for building up / optimizing the reassoc IL but only do so when we commit changes. Thanks, Richard. Hi Richard, I've revised my patch along these lines; see the new version below. While testing it I realized I could do a better job of reducing the number of multiplies, so there are some changes to that logic as well, and a couple of additional test cases. Regstrapped successfully on powerpc64-linux. Hope this looks better! Yes indeed. A few observations though. You didn't integrate attempt_builtin_powi with optimize_ops_list - presumably because it's result does not really fit the single-operation assumption? But note that undistribute_ops_list and optimize_range_tests have the same issue. Thus, I'd have prefered if attempt_builtin_powi worked in the same way, remove the parts of the ops list it consumed and stick an operand for its result there instead. That should simplify things (not having that special powi_result) and allow for multiple powi results in a single op list? Multiple powi results are already handled, but yes, what you're suggesting would simplify things by eliminating the need to create explicit multiplies to join them and the cached-multiply results together. Sounds reasonable on the surface; it just hadn't occurred to me to do it this way. I'll have a look. Any other major concerns while I'm reworking this? No, the rest looks fine (you should not need to repace -fdump-tree-reassoc-details with -fdump-tree-reassoc1-details -fdump-tree-reassoc2-details in the first testcase). Unfortunately this seems to be necessary if I name the two passes reassoc1 and reassoc2. If I try to name both of them reassoc I get failures in other tests like gfortran.dg/reassoc_4, where -fdump-tree-reassoc1 doesn't work. Unless I'm missing something obvious, I think I need to keep that change. Hm, naming them reassoc1 and reassoc2 is a hack. Naming both reassoc will not trigger re-naming them to reassoc1 and reassoc2 I think. How ugly. Especially that -fdump-tree-reassoc will no longer work. Maybe instead of using two pass structs resort to using the existing hack with using first_pass_instance and TODO_mark_first_instance. Frankly I was surprised and relieved that there weren't more tests that used the generic -fdump-tree-reassoc. Thanks, Bill Thanks, Richard. Thanks, Bill Thanks, Richard.
Re: [PATCH] Fix PR18589
On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote: On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Unfortunately this seems to be necessary if I name the two passes reassoc1 and reassoc2. If I try to name both of them reassoc I get failures in other tests like gfortran.dg/reassoc_4, where -fdump-tree-reassoc1 doesn't work. Unless I'm missing something obvious, I think I need to keep that change. Hm, naming them reassoc1 and reassoc2 is a hack. Naming both reassoc will not trigger re-naming them to reassoc1 and reassoc2 I think. How ugly. Especially that -fdump-tree-reassoc will no longer work. Maybe instead of using two pass structs resort to using the existing hack with using first_pass_instance and TODO_mark_first_instance. OK, that seems to be the best among evils. Using the first_pass_instance hack, the patch is transformed as below. Regstrapped on powerpc64-linux, no additional failures. OK for trunk? Thanks, Bill gcc: 2012-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (operand_entry): Add count field. (add_repeat_to_ops_vec): New function. (completely_remove_stmt): Likewise. (remove_def_if_absorbed_call): Likewise. (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. (acceptable_pow_call): New function. (linearize_expr_tree): Look for builtin pow/powi calls and add operand entries with repeat counts when found. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Call attempt_builtin_powi. (fini_reassoc): Two new calls to statistics_counter_event. gcc/testsuite: 2012-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * gcc.dg/tree-ssa/pr18589-1.c: New test. * gcc.dg/tree-ssa/pr18589-2.c: Likewise. * gcc.dg/tree-ssa/pr18589-3.c: Likewise. * gcc.dg/tree-ssa/pr18589-4.c: Likewise. * gcc.dg/tree-ssa/pr18589-5.c: Likewise. * gcc.dg/tree-ssa/pr18589-6.c: Likewise. * gcc.dg/tree-ssa/pr18589-7.c: Likewise. * gcc.dg/tree-ssa/pr18589-8.c: Likewise. * gcc.dg/tree-ssa/pr18589-9.c: Likewise. * gcc.dg/tree-ssa/pr18589-10.c: Likewise. Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +double baz (double x, double y, double z, double u) +{ + return x * x * y * y * y * z * z * z * z * u; +} + +/* { dg-final { scan-tree-dump-times \\* 6 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +double baz (double x, double y, double z, double u) +{ + return x * x * x * y * y * y * z * z * z * z * u * u * u * u; +} + +/* { dg-final { scan-tree-dump-times \\* 6 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +double baz (double x, double y) +{ + return __builtin_pow (x, 3.0) * __builtin_pow (y, 4.0); +} + +/* { dg-final { scan-tree-dump-times \\* 4 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +float baz (float x, float y) +{ + return x * x * x * x * y * y * y * y; +} + +/* { dg-final { scan-tree-dump-times \\* 3 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c
Re: [PATCH] Fix PR18589
On Tue, Apr 3, 2012 at 10:25 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Wed, 2012-03-28 at 15:57 +0200, Richard Guenther wrote: On Tue, Mar 6, 2012 at 9:49 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, This is a re-post of the patch I posted for comments in January to address http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18589. The patch modifies reassociation to expose repeated factors from __builtin_pow* calls, optimally reassociate repeated factors, and possibly reconstitute __builtin_powi calls from the results of reassociation. Bootstrapped and passes regression tests for powerpc64-linux-gnu. I expect there may need to be some small changes, but I am targeting this for trunk approval. Thanks very much for the review, Hmm. How much work would it be to extend the reassoc 'IL' to allow a repeat factor per op? I realize what you do is all within what reassoc already does though ideally we would not require any GIMPLE IL changes for building up / optimizing the reassoc IL but only do so when we commit changes. Thanks, Richard. Hi Richard, I've revised my patch along these lines; see the new version below. While testing it I realized I could do a better job of reducing the number of multiplies, so there are some changes to that logic as well, and a couple of additional test cases. Regstrapped successfully on powerpc64-linux. Hope this looks better! Yes indeed. A few observations though. You didn't integrate attempt_builtin_powi with optimize_ops_list - presumably because it's result does not really fit the single-operation assumption? But note that undistribute_ops_list and optimize_range_tests have the same issue. Thus, I'd have prefered if attempt_builtin_powi worked in the same way, remove the parts of the ops list it consumed and stick an operand for its result there instead. That should simplify things (not having that special powi_result) and allow for multiple powi results in a single op list? Thanks, Richard. Thanks, Bill gcc: 2012-04-03 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * tree-pass.h: Replace pass_reassoc with pass_early_reassoc and pass_late_reassoc. * passes.c (init_optimization_passes): Change pass_reassoc calls to pass_early_reassoc and pass_late_reassoc. * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (operand_entry): Add count field. (early_reassoc): New static var. (add_repeat_to_ops_vec): New function. (completely_remove_stmt): Likewise. (remove_def_if_absorbed_call): Likewise. (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. (acceptable_pow_call): New function. (linearize_expr_tree): Look for builtin pow/powi calls and add operand entries with repeat counts when found. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Attempt to create __builtin_powi calls, and multiply their results by any leftover reassociated factors; remove builtin pow/powi calls that were absorbed by reassociation. (fini_reassoc): Two new calls to statistics_counter_event. (execute_early_reassoc): New function. (execute_late_reassoc): Likewise. (pass_early_reassoc): Replace pass_reassoc, renamed to reassoc1, call execute_early_reassoc. (pass_late_reassoc): New gimple_opt_pass named reassoc2 that calls execute_late_reassoc. gcc/testsuite: 2012-04-03 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * gcc.dg/pr46309.c: Change -fdump-tree-reassoc-details to -fdump-tree-reassoc[12]-details. * gcc.dg/tree-ssa/pr18589-1.c: New test. * gcc.dg/tree-ssa/pr18589-2.c: Likewise. * gcc.dg/tree-ssa/pr18589-3.c: Likewise. * gcc.dg/tree-ssa/pr18589-4.c: Likewise. * gcc.dg/tree-ssa/pr18589-5.c: Likewise. * gcc.dg/tree-ssa/pr18589-6.c: Likewise. * gcc.dg/tree-ssa/pr18589-7.c: Likewise. * gcc.dg/tree-ssa/pr18589-8.c: Likewise. * gcc.dg/tree-ssa/pr18589-9.c: Likewise. * gcc.dg/tree-ssa/pr18589-10.c: Likewise. Index: gcc/tree-pass.h === --- gcc/tree-pass.h (revision 186108) +++ gcc/tree-pass.h (working copy) @@ -441,7 +441,8 @@ extern struct gimple_opt_pass pass_copy_prop; extern struct gimple_opt_pass pass_vrp; extern struct gimple_opt_pass pass_uncprop; extern struct gimple_opt_pass pass_return_slot; -extern struct gimple_opt_pass pass_reassoc; +extern struct gimple_opt_pass pass_early_reassoc; +extern struct gimple_opt_pass pass_late_reassoc;
Re: [PATCH] Fix PR18589
On Wed, 2012-04-04 at 13:35 +0200, Richard Guenther wrote: On Tue, Apr 3, 2012 at 10:25 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Wed, 2012-03-28 at 15:57 +0200, Richard Guenther wrote: On Tue, Mar 6, 2012 at 9:49 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, This is a re-post of the patch I posted for comments in January to address http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18589. The patch modifies reassociation to expose repeated factors from __builtin_pow* calls, optimally reassociate repeated factors, and possibly reconstitute __builtin_powi calls from the results of reassociation. Bootstrapped and passes regression tests for powerpc64-linux-gnu. I expect there may need to be some small changes, but I am targeting this for trunk approval. Thanks very much for the review, Hmm. How much work would it be to extend the reassoc 'IL' to allow a repeat factor per op? I realize what you do is all within what reassoc already does though ideally we would not require any GIMPLE IL changes for building up / optimizing the reassoc IL but only do so when we commit changes. Thanks, Richard. Hi Richard, I've revised my patch along these lines; see the new version below. While testing it I realized I could do a better job of reducing the number of multiplies, so there are some changes to that logic as well, and a couple of additional test cases. Regstrapped successfully on powerpc64-linux. Hope this looks better! Yes indeed. A few observations though. You didn't integrate attempt_builtin_powi with optimize_ops_list - presumably because it's result does not really fit the single-operation assumption? But note that undistribute_ops_list and optimize_range_tests have the same issue. Thus, I'd have prefered if attempt_builtin_powi worked in the same way, remove the parts of the ops list it consumed and stick an operand for its result there instead. That should simplify things (not having that special powi_result) and allow for multiple powi results in a single op list? Multiple powi results are already handled, but yes, what you're suggesting would simplify things by eliminating the need to create explicit multiplies to join them and the cached-multiply results together. Sounds reasonable on the surface; it just hadn't occurred to me to do it this way. I'll have a look. Any other major concerns while I'm reworking this? Thanks, Bill Thanks, Richard.
Re: [PATCH] Fix PR18589
On Wed, Apr 4, 2012 at 2:35 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Wed, 2012-04-04 at 13:35 +0200, Richard Guenther wrote: On Tue, Apr 3, 2012 at 10:25 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Wed, 2012-03-28 at 15:57 +0200, Richard Guenther wrote: On Tue, Mar 6, 2012 at 9:49 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, This is a re-post of the patch I posted for comments in January to address http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18589. The patch modifies reassociation to expose repeated factors from __builtin_pow* calls, optimally reassociate repeated factors, and possibly reconstitute __builtin_powi calls from the results of reassociation. Bootstrapped and passes regression tests for powerpc64-linux-gnu. I expect there may need to be some small changes, but I am targeting this for trunk approval. Thanks very much for the review, Hmm. How much work would it be to extend the reassoc 'IL' to allow a repeat factor per op? I realize what you do is all within what reassoc already does though ideally we would not require any GIMPLE IL changes for building up / optimizing the reassoc IL but only do so when we commit changes. Thanks, Richard. Hi Richard, I've revised my patch along these lines; see the new version below. While testing it I realized I could do a better job of reducing the number of multiplies, so there are some changes to that logic as well, and a couple of additional test cases. Regstrapped successfully on powerpc64-linux. Hope this looks better! Yes indeed. A few observations though. You didn't integrate attempt_builtin_powi with optimize_ops_list - presumably because it's result does not really fit the single-operation assumption? But note that undistribute_ops_list and optimize_range_tests have the same issue. Thus, I'd have prefered if attempt_builtin_powi worked in the same way, remove the parts of the ops list it consumed and stick an operand for its result there instead. That should simplify things (not having that special powi_result) and allow for multiple powi results in a single op list? Multiple powi results are already handled, but yes, what you're suggesting would simplify things by eliminating the need to create explicit multiplies to join them and the cached-multiply results together. Sounds reasonable on the surface; it just hadn't occurred to me to do it this way. I'll have a look. Any other major concerns while I'm reworking this? No, the rest looks fine (you should not need to repace -fdump-tree-reassoc-details with -fdump-tree-reassoc1-details -fdump-tree-reassoc2-details in the first testcase). Thanks, Richard. Thanks, Bill Thanks, Richard.
Re: [PATCH] Fix PR18589
On Wed, 2012-04-04 at 15:08 +0200, Richard Guenther wrote: On Wed, Apr 4, 2012 at 2:35 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Wed, 2012-04-04 at 13:35 +0200, Richard Guenther wrote: On Tue, Apr 3, 2012 at 10:25 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Wed, 2012-03-28 at 15:57 +0200, Richard Guenther wrote: On Tue, Mar 6, 2012 at 9:49 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, This is a re-post of the patch I posted for comments in January to address http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18589. The patch modifies reassociation to expose repeated factors from __builtin_pow* calls, optimally reassociate repeated factors, and possibly reconstitute __builtin_powi calls from the results of reassociation. Bootstrapped and passes regression tests for powerpc64-linux-gnu. I expect there may need to be some small changes, but I am targeting this for trunk approval. Thanks very much for the review, Hmm. How much work would it be to extend the reassoc 'IL' to allow a repeat factor per op? I realize what you do is all within what reassoc already does though ideally we would not require any GIMPLE IL changes for building up / optimizing the reassoc IL but only do so when we commit changes. Thanks, Richard. Hi Richard, I've revised my patch along these lines; see the new version below. While testing it I realized I could do a better job of reducing the number of multiplies, so there are some changes to that logic as well, and a couple of additional test cases. Regstrapped successfully on powerpc64-linux. Hope this looks better! Yes indeed. A few observations though. You didn't integrate attempt_builtin_powi with optimize_ops_list - presumably because it's result does not really fit the single-operation assumption? But note that undistribute_ops_list and optimize_range_tests have the same issue. Thus, I'd have prefered if attempt_builtin_powi worked in the same way, remove the parts of the ops list it consumed and stick an operand for its result there instead. That should simplify things (not having that special powi_result) and allow for multiple powi results in a single op list? Multiple powi results are already handled, but yes, what you're suggesting would simplify things by eliminating the need to create explicit multiplies to join them and the cached-multiply results together. Sounds reasonable on the surface; it just hadn't occurred to me to do it this way. I'll have a look. Any other major concerns while I'm reworking this? No, the rest looks fine (you should not need to repace -fdump-tree-reassoc-details with -fdump-tree-reassoc1-details -fdump-tree-reassoc2-details in the first testcase). Unfortunately this seems to be necessary if I name the two passes reassoc1 and reassoc2. If I try to name both of them reassoc I get failures in other tests like gfortran.dg/reassoc_4, where -fdump-tree-reassoc1 doesn't work. Unless I'm missing something obvious, I think I need to keep that change. Frankly I was surprised and relieved that there weren't more tests that used the generic -fdump-tree-reassoc. Thanks, Bill Thanks, Richard. Thanks, Bill Thanks, Richard.
Re: [PATCH] Fix PR18589
On Wed, 2012-04-04 at 13:35 +0200, Richard Guenther wrote: On Tue, Apr 3, 2012 at 10:25 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Hi Richard, I've revised my patch along these lines; see the new version below. While testing it I realized I could do a better job of reducing the number of multiplies, so there are some changes to that logic as well, and a couple of additional test cases. Regstrapped successfully on powerpc64-linux. Hope this looks better! Yes indeed. A few observations though. You didn't integrate attempt_builtin_powi with optimize_ops_list - presumably because it's result does not really fit the single-operation assumption? But note that undistribute_ops_list and optimize_range_tests have the same issue. Thus, I'd have prefered if attempt_builtin_powi worked in the same way, remove the parts of the ops list it consumed and stick an operand for its result there instead. That should simplify things (not having that special powi_result) and allow for multiple powi results in a single op list? An excellent suggestion. I've implemented it below and it is indeed much cleaner this way. Bootstrapped/regression tested with no new failures on powerpc64-linux. Is this incarnation OK for trunk? Thanks, Bill Thanks, Richard. Thanks, Bill gcc: 2012-04-04 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * tree-pass.h: Replace pass_reassoc with pass_early_reassoc and pass_late_reassoc. * passes.c (init_optimization_passes): Change pass_reassoc calls to pass_early_reassoc and pass_late_reassoc. * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (operand_entry): Add count field. (early_reassoc): New static var. (add_repeat_to_ops_vec): New function. (completely_remove_stmt): Likewise. (remove_def_if_absorbed_call): Likewise. (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. (acceptable_pow_call): New function. (linearize_expr_tree): Look for builtin pow/powi calls and add operand entries with repeat counts when found. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Call attempt_builtin_powi. (fini_reassoc): Two new calls to statistics_counter_event. (execute_early_reassoc): New function. (execute_late_reassoc): Likewise. (pass_early_reassoc): Rename from pass_reassoc, call execute_early_reassoc. (pass_late_reassoc): New gimple_opt_pass that calls execute_late_reassoc. gcc/testsuite: 2012-04-04 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * gcc.dg/pr46309.c: Change -fdump-tree-reassoc-details to -fdump-tree-reassoc1-details and -fdump-tree-reassoc2-details. * gcc.dg/tree-ssa/pr18589-1.c: New test. * gcc.dg/tree-ssa/pr18589-2.c: Likewise. * gcc.dg/tree-ssa/pr18589-3.c: Likewise. * gcc.dg/tree-ssa/pr18589-4.c: Likewise. * gcc.dg/tree-ssa/pr18589-5.c: Likewise. * gcc.dg/tree-ssa/pr18589-6.c: Likewise. * gcc.dg/tree-ssa/pr18589-7.c: Likewise. * gcc.dg/tree-ssa/pr18589-8.c: Likewise. * gcc.dg/tree-ssa/pr18589-9.c: Likewise. * gcc.dg/tree-ssa/pr18589-10.c: Likewise. Index: gcc/tree-pass.h === --- gcc/tree-pass.h (revision 186108) +++ gcc/tree-pass.h (working copy) @@ -441,7 +441,8 @@ extern struct gimple_opt_pass pass_copy_prop; extern struct gimple_opt_pass pass_vrp; extern struct gimple_opt_pass pass_uncprop; extern struct gimple_opt_pass pass_return_slot; -extern struct gimple_opt_pass pass_reassoc; +extern struct gimple_opt_pass pass_early_reassoc; +extern struct gimple_opt_pass pass_late_reassoc; extern struct gimple_opt_pass pass_rebuild_cgraph_edges; extern struct gimple_opt_pass pass_remove_cgraph_callee_edges; extern struct gimple_opt_pass pass_build_cgraph_edges; Index: gcc/testsuite/gcc.dg/pr46309.c === --- gcc/testsuite/gcc.dg/pr46309.c (revision 186108) +++ gcc/testsuite/gcc.dg/pr46309.c (working copy) @@ -1,6 +1,6 @@ /* PR tree-optimization/46309 */ /* { dg-do compile } */ -/* { dg-options -O2 -fdump-tree-reassoc-details } */ +/* { dg-options -O2 -fdump-tree-reassoc1-details -fdump-tree-reassoc2-details } */ /* The transformation depends on BRANCH_COST being greater than 1 (see the notes in the PR), so try to force that. */ /* { dg-additional-options -mtune=octeon2 { target mips*-*-* } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c
Re: [PATCH] Fix PR18589
On Wed, 2012-03-28 at 15:57 +0200, Richard Guenther wrote: On Tue, Mar 6, 2012 at 9:49 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, This is a re-post of the patch I posted for comments in January to address http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18589. The patch modifies reassociation to expose repeated factors from __builtin_pow* calls, optimally reassociate repeated factors, and possibly reconstitute __builtin_powi calls from the results of reassociation. Bootstrapped and passes regression tests for powerpc64-linux-gnu. I expect there may need to be some small changes, but I am targeting this for trunk approval. Thanks very much for the review, Hmm. How much work would it be to extend the reassoc 'IL' to allow a repeat factor per op? I realize what you do is all within what reassoc already does though ideally we would not require any GIMPLE IL changes for building up / optimizing the reassoc IL but only do so when we commit changes. Thanks, Richard. Hi Richard, I've revised my patch along these lines; see the new version below. While testing it I realized I could do a better job of reducing the number of multiplies, so there are some changes to that logic as well, and a couple of additional test cases. Regstrapped successfully on powerpc64-linux. Hope this looks better! Thanks, Bill gcc: 2012-04-03 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * tree-pass.h: Replace pass_reassoc with pass_early_reassoc and pass_late_reassoc. * passes.c (init_optimization_passes): Change pass_reassoc calls to pass_early_reassoc and pass_late_reassoc. * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (operand_entry): Add count field. (early_reassoc): New static var. (add_repeat_to_ops_vec): New function. (completely_remove_stmt): Likewise. (remove_def_if_absorbed_call): Likewise. (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. (acceptable_pow_call): New function. (linearize_expr_tree): Look for builtin pow/powi calls and add operand entries with repeat counts when found. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Attempt to create __builtin_powi calls, and multiply their results by any leftover reassociated factors; remove builtin pow/powi calls that were absorbed by reassociation. (fini_reassoc): Two new calls to statistics_counter_event. (execute_early_reassoc): New function. (execute_late_reassoc): Likewise. (pass_early_reassoc): Replace pass_reassoc, renamed to reassoc1, call execute_early_reassoc. (pass_late_reassoc): New gimple_opt_pass named reassoc2 that calls execute_late_reassoc. gcc/testsuite: 2012-04-03 Bill Schmidt wschm...@linux.vnet.ibm.com PR tree-optimization/18589 * gcc.dg/pr46309.c: Change -fdump-tree-reassoc-details to -fdump-tree-reassoc[12]-details. * gcc.dg/tree-ssa/pr18589-1.c: New test. * gcc.dg/tree-ssa/pr18589-2.c: Likewise. * gcc.dg/tree-ssa/pr18589-3.c: Likewise. * gcc.dg/tree-ssa/pr18589-4.c: Likewise. * gcc.dg/tree-ssa/pr18589-5.c: Likewise. * gcc.dg/tree-ssa/pr18589-6.c: Likewise. * gcc.dg/tree-ssa/pr18589-7.c: Likewise. * gcc.dg/tree-ssa/pr18589-8.c: Likewise. * gcc.dg/tree-ssa/pr18589-9.c: Likewise. * gcc.dg/tree-ssa/pr18589-10.c: Likewise. Index: gcc/tree-pass.h === --- gcc/tree-pass.h (revision 186108) +++ gcc/tree-pass.h (working copy) @@ -441,7 +441,8 @@ extern struct gimple_opt_pass pass_copy_prop; extern struct gimple_opt_pass pass_vrp; extern struct gimple_opt_pass pass_uncprop; extern struct gimple_opt_pass pass_return_slot; -extern struct gimple_opt_pass pass_reassoc; +extern struct gimple_opt_pass pass_early_reassoc; +extern struct gimple_opt_pass pass_late_reassoc; extern struct gimple_opt_pass pass_rebuild_cgraph_edges; extern struct gimple_opt_pass pass_remove_cgraph_callee_edges; extern struct gimple_opt_pass pass_build_cgraph_edges; Index: gcc/testsuite/gcc.dg/pr46309.c === --- gcc/testsuite/gcc.dg/pr46309.c (revision 186108) +++ gcc/testsuite/gcc.dg/pr46309.c (working copy) @@ -1,6 +1,6 @@ /* PR tree-optimization/46309 */ /* { dg-do compile } */ -/* { dg-options -O2 -fdump-tree-reassoc-details } */ +/* { dg-options -O2 -fdump-tree-reassoc1-details -fdump-tree-reassoc2-details } */ /* The transformation depends on BRANCH_COST being greater than 1 (see the
Re: [PATCH] Fix PR18589
On Tue, Mar 6, 2012 at 9:49 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, This is a re-post of the patch I posted for comments in January to address http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18589. The patch modifies reassociation to expose repeated factors from __builtin_pow* calls, optimally reassociate repeated factors, and possibly reconstitute __builtin_powi calls from the results of reassociation. Bootstrapped and passes regression tests for powerpc64-linux-gnu. I expect there may need to be some small changes, but I am targeting this for trunk approval. Thanks very much for the review, Hmm. How much work would it be to extend the reassoc 'IL' to allow a repeat factor per op? I realize what you do is all within what reassoc already does though ideally we would not require any GIMPLE IL changes for building up / optimizing the reassoc IL but only do so when we commit changes. Thanks, Richard. Bill gcc: 2012-03-06 Bill Schmidt wschm...@linux.vnet.ibm.com * tree-pass.h: Replace pass_reassoc with pass_early_reassoc and pass_late_reassoc. * passes.c (init_optimization_passes): Change pass_reassoc calls to pass_early_reassoc and pass_late_reassoc. * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (early_reassoc): New static var. (MAX_POW_EXPAND): New #define'd constant. (linear_expand_pow_common): New function. (linear_expand_powi): Likewise. (linear_expand_pow): Likewise. (break_up_subtract_bb): Attempt to expand __builtin_pow[i]. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Attempt to reconstitute __builtin_powi calls, and multiply their results by any leftover reassociated factors. (fini_reassoc): Two new calls to statistics_counter_event. (execute_early_reassoc): New function. (execute_late_reassoc): Likewise. (pass_early_reassoc): Replace pass_reassoc, renamed to reassoc1, call execute_early_reassoc. (pass_late_reassoc): New gimple_opt_pass named reassoc2 that calls execute_late_reassoc. gcc/testsuite: 2012-03-06 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.dg/pr46309.c: Change -fdump-tree-reassoc-details to -fdump-tree-reassoc[12]-details. * gcc.dg/tree-ssa/pr18589-1.c: New test. * gcc.dg/tree-ssa/pr18589-2.c: Likewise. * gcc.dg/tree-ssa/pr18589-3.c: Likewise. * gcc.dg/tree-ssa/pr18589-4.c: Likewise. * gcc.dg/tree-ssa/pr18589-5.c: Likewise. * gcc.dg/tree-ssa/pr18589-6.c: Likewise. * gcc.dg/tree-ssa/pr18589-7.c: Likewise. * gcc.dg/tree-ssa/pr18589-8.c: Likewise. Index: gcc/tree-pass.h === --- gcc/tree-pass.h (revision 184997) +++ gcc/tree-pass.h (working copy) @@ -440,7 +440,8 @@ extern struct gimple_opt_pass pass_copy_prop; extern struct gimple_opt_pass pass_vrp; extern struct gimple_opt_pass pass_uncprop; extern struct gimple_opt_pass pass_return_slot; -extern struct gimple_opt_pass pass_reassoc; +extern struct gimple_opt_pass pass_early_reassoc; +extern struct gimple_opt_pass pass_late_reassoc; extern struct gimple_opt_pass pass_rebuild_cgraph_edges; extern struct gimple_opt_pass pass_remove_cgraph_callee_edges; extern struct gimple_opt_pass pass_build_cgraph_edges; Index: gcc/testsuite/gcc.dg/pr46309.c === --- gcc/testsuite/gcc.dg/pr46309.c (revision 184997) +++ gcc/testsuite/gcc.dg/pr46309.c (working copy) @@ -1,6 +1,6 @@ /* PR tree-optimization/46309 */ /* { dg-do compile } */ -/* { dg-options -O2 -fdump-tree-reassoc-details } */ +/* { dg-options -O2 -fdump-tree-reassoc1-details -fdump-tree-reassoc2-details } */ /* The transformation depends on BRANCH_COST being greater than 1 (see the notes in the PR), so try to force that. */ /* { dg-additional-options -mtune=octeon2 { target mips*-*-* } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +double baz (double x, double y, double z, double u) +{ + return x * x * y * y * y * z * z * z * z * u; +} + +/* { dg-final { scan-tree-dump-times \\* 7 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c ===
Re: [PATCH] Fix PR18589
On Wed, 2012-03-28 at 15:57 +0200, Richard Guenther wrote: On Tue, Mar 6, 2012 at 9:49 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, This is a re-post of the patch I posted for comments in January to address http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18589. The patch modifies reassociation to expose repeated factors from __builtin_pow* calls, optimally reassociate repeated factors, and possibly reconstitute __builtin_powi calls from the results of reassociation. Bootstrapped and passes regression tests for powerpc64-linux-gnu. I expect there may need to be some small changes, but I am targeting this for trunk approval. Thanks very much for the review, Hmm. How much work would it be to extend the reassoc 'IL' to allow a repeat factor per op? I realize what you do is all within what reassoc already does though ideally we would not require any GIMPLE IL changes for building up / optimizing the reassoc IL but only do so when we commit changes. Ah, I take your point. I will look into it. We still need the additional data structures to allow sorting by factor repeat counts, but perhaps expanding the builtins can be avoided until it's proven necessary. The patch as submitted may be slightly easier to implement and understand, but I agree it would be better to avoid changing GIMPLE unnecessarily if possible. I'll get back to you shortly. Thanks, Bill Thanks, Richard.
[PATCH] Fix PR18589
Hi, This is a re-post of the patch I posted for comments in January to address http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18589. The patch modifies reassociation to expose repeated factors from __builtin_pow* calls, optimally reassociate repeated factors, and possibly reconstitute __builtin_powi calls from the results of reassociation. Bootstrapped and passes regression tests for powerpc64-linux-gnu. I expect there may need to be some small changes, but I am targeting this for trunk approval. Thanks very much for the review, Bill gcc: 2012-03-06 Bill Schmidt wschm...@linux.vnet.ibm.com * tree-pass.h: Replace pass_reassoc with pass_early_reassoc and pass_late_reassoc. * passes.c (init_optimization_passes): Change pass_reassoc calls to pass_early_reassoc and pass_late_reassoc. * tree-ssa-reassoc.c (reassociate_stats): Add two fields. (early_reassoc): New static var. (MAX_POW_EXPAND): New #define'd constant. (linear_expand_pow_common): New function. (linear_expand_powi): Likewise. (linear_expand_pow): Likewise. (break_up_subtract_bb): Attempt to expand __builtin_pow[i]. (repeat_factor_d): New struct and associated typedefs. (repeat_factor_vec): New static vector variable. (compare_repeat_factors): New function. (get_reassoc_pow_ssa_name): Likewise. (attempt_builtin_powi): Likewise. (reassociate_bb): Attempt to reconstitute __builtin_powi calls, and multiply their results by any leftover reassociated factors. (fini_reassoc): Two new calls to statistics_counter_event. (execute_early_reassoc): New function. (execute_late_reassoc): Likewise. (pass_early_reassoc): Replace pass_reassoc, renamed to reassoc1, call execute_early_reassoc. (pass_late_reassoc): New gimple_opt_pass named reassoc2 that calls execute_late_reassoc. gcc/testsuite: 2012-03-06 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.dg/pr46309.c: Change -fdump-tree-reassoc-details to -fdump-tree-reassoc[12]-details. * gcc.dg/tree-ssa/pr18589-1.c: New test. * gcc.dg/tree-ssa/pr18589-2.c: Likewise. * gcc.dg/tree-ssa/pr18589-3.c: Likewise. * gcc.dg/tree-ssa/pr18589-4.c: Likewise. * gcc.dg/tree-ssa/pr18589-5.c: Likewise. * gcc.dg/tree-ssa/pr18589-6.c: Likewise. * gcc.dg/tree-ssa/pr18589-7.c: Likewise. * gcc.dg/tree-ssa/pr18589-8.c: Likewise. Index: gcc/tree-pass.h === --- gcc/tree-pass.h (revision 184997) +++ gcc/tree-pass.h (working copy) @@ -440,7 +440,8 @@ extern struct gimple_opt_pass pass_copy_prop; extern struct gimple_opt_pass pass_vrp; extern struct gimple_opt_pass pass_uncprop; extern struct gimple_opt_pass pass_return_slot; -extern struct gimple_opt_pass pass_reassoc; +extern struct gimple_opt_pass pass_early_reassoc; +extern struct gimple_opt_pass pass_late_reassoc; extern struct gimple_opt_pass pass_rebuild_cgraph_edges; extern struct gimple_opt_pass pass_remove_cgraph_callee_edges; extern struct gimple_opt_pass pass_build_cgraph_edges; Index: gcc/testsuite/gcc.dg/pr46309.c === --- gcc/testsuite/gcc.dg/pr46309.c (revision 184997) +++ gcc/testsuite/gcc.dg/pr46309.c (working copy) @@ -1,6 +1,6 @@ /* PR tree-optimization/46309 */ /* { dg-do compile } */ -/* { dg-options -O2 -fdump-tree-reassoc-details } */ +/* { dg-options -O2 -fdump-tree-reassoc1-details -fdump-tree-reassoc2-details } */ /* The transformation depends on BRANCH_COST being greater than 1 (see the notes in the PR), so try to force that. */ /* { dg-additional-options -mtune=octeon2 { target mips*-*-* } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +double baz (double x, double y, double z, double u) +{ + return x * x * y * y * y * z * z * z * z * u; +} + +/* { dg-final { scan-tree-dump-times \\* 7 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */ + +double baz (double x, double y, double z, double u) +{ + return x * x * x * y * y * y * z * z * z * z * u * u * u * u; +} + +/* { dg-final { scan-tree-dump-times \\* 6 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized }