Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs
On 03/21/2018 04:43 PM, Richard Biener wrote: On Wed, 21 Mar 2018, Tom de Vries wrote: On 03/12/2018 01:14 PM, Richard Biener wrote: On Thu, 22 Feb 2018, Tom de Vries wrote: Hi, this patch fixes an ICE in the parloops pass. The ICE (when compiling the test-case in attached patch) follows from the fact that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to "base all the induction variables in LOOP on a single control one": ... /* Base all the induction variables in LOOP on a single control one.*/ canonicalize_loop_ivs (loop, &nit, true); ... This is caused by the fact that for one of the induction variables, simple_iv no longer returns true (as was the case at the start of gen_parallel_loop). This seems to be triggered by the fact that during loop_version we call scev_reset (although I'm not sure why when recalculating scev info we're not reaching the same conclusion as before). I guess the real reason is that canonicalize_loop_ivs calls create_iv first which in the parloop case with bump-in-latch true and then incrementally re-writes PHIs, eventually wrecking other PHIs SCEV. In this case the 2nd PHIs evolution depends on the first one but the rewritten ones depend on the new IV already. So the better fix (maybe not 100% enough) would be to re-organize canonicalize_loop_ivs to first do analysis of all PHIs and then rewrite them all. Focusing on the re-organize first, is this sort of what you had in mind? Yes, though not sure why you need to have a hash-map here, just pushing to a vector of PHIs would work, no? Or alternatively a bitmap of SSA versions so you have a way to match the gphi iterator with the set of IVs. But the vector should be sorted in PHI order so just traversing the PHIs looking for the "next" PHI in the vector would work I guess. Does it help with the original issue btw? Not as such. The original problem happens before canonicalize_loop_ivs. Perhaps the easiest way to demonstrate the problem is this patch: ... diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 3a788cc..801fc46 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2254,6 +2254,38 @@ num_phis (basic_block bb, bool count_virtual_p) return nr_phis; } +static unsigned +get_nr_simple_ivs (struct loop *loop) +{ + unsigned cnt = 0; + basic_block *bbs = get_loop_body_in_dom_order (loop); + + for (unsigned i = 0; i < loop->num_nodes; i++) +{ + basic_block bb = bbs[i]; + if (bb->loop_father != loop) + continue; + + for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi); + gsi_next (&psi)) + { + gphi *phi = psi.phi (); + tree res = PHI_RESULT (phi); + if (virtual_operand_p (res)) + continue; + + affine_iv iv; + if (!simple_iv (loop, loop, res, &iv, true)) + continue; + cnt++; + } +} + + free (bbs); + + return cnt; +} + /* Generates code to execute the iterations of LOOP in N_THREADS threads in parallel, which can be 0 if that number is to be determined later. @@ -2378,11 +2410,15 @@ gen_parallel_loop (struct loop *loop, initialize_original_copy_tables (); /* We assume that the loop usually iterates a lot. */ + fprintf (stderr, "nr simple ivs: %u @ line %d\n", + get_nr_simple_ivs (loop), __LINE__); loop_version (loop, many_iterations_cond, NULL, profile_probability::likely (), profile_probability::unlikely (), profile_probability::likely (), profile_probability::unlikely (), true); + fprintf (stderr, "nr simple ivs: %u @ line %d\n", + get_nr_simple_ivs (loop), __LINE__); update_ssa (TODO_update_ssa); free_original_copy_tables (); } ... and this output for pr83126.c: ... nr simple ivs: 3 @ line 2414 nr simple ivs: 2 @ line 2421 ... I've written attached follow-up patch that works around that problem by analyzing the phis before calling loop_version. It works for the example, and bootstraps fine, but I'm not familiar enough with loop_version to known whether this is safe or not. Thanks, - Tom [parloops] Fix canonicalize_loop_ivs in gen_parallel_loop 2018-03-21 Tom de Vries PR tree-optimization/83126 * tree-parloops.c (gen_parallel_loop): Inline canonicalize_loop_ivs. Assert that canonicalize_loop_ivs succeeds. * tree-ssa-loop-manip.c (get_all_phi_affine_ivs): Remove static. (struct phi_affine_iv): Move ... * tree-ssa-loop-manip.h (struct phi_affine_iv): ... here. (get_all_phi_affine_ivs, canonicalize_loop_ivs_1): Declare. --- gcc/tree-parloops.c | 27 ++- gcc/tree-ssa-loop-manip.c | 8 +--- gcc/tree-ssa-loop-manip.h | 10 ++ 3 files changed, 17 insertions(+), 28 deletions(-) diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 3a788cc..9c2d19a 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloo
Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs
On 03/21/2018 04:43 PM, Richard Biener wrote: On Wed, 21 Mar 2018, Tom de Vries wrote: On 03/12/2018 01:14 PM, Richard Biener wrote: On Thu, 22 Feb 2018, Tom de Vries wrote: Hi, this patch fixes an ICE in the parloops pass. The ICE (when compiling the test-case in attached patch) follows from the fact that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to "base all the induction variables in LOOP on a single control one": ... /* Base all the induction variables in LOOP on a single control one.*/ canonicalize_loop_ivs (loop, &nit, true); ... This is caused by the fact that for one of the induction variables, simple_iv no longer returns true (as was the case at the start of gen_parallel_loop). This seems to be triggered by the fact that during loop_version we call scev_reset (although I'm not sure why when recalculating scev info we're not reaching the same conclusion as before). I guess the real reason is that canonicalize_loop_ivs calls create_iv first which in the parloop case with bump-in-latch true and then incrementally re-writes PHIs, eventually wrecking other PHIs SCEV. In this case the 2nd PHIs evolution depends on the first one but the rewritten ones depend on the new IV already. So the better fix (maybe not 100% enough) would be to re-organize canonicalize_loop_ivs to first do analysis of all PHIs and then rewrite them all. Focusing on the re-organize first, is this sort of what you had in mind? Yes, though not sure why you need to have a hash-map here, just pushing to a vector of PHIs would work, no? Or alternatively a bitmap of SSA versions so you have a way to match the gphi iterator with the set of IVs. But the vector should be sorted in PHI order so just traversing the PHIs looking for the "next" PHI in the vector would work I guess. The reason I chose a hash map was robustness: to allow the maximum amount of change between analysis and transformation. I've now used a vector of this struct: ... struct phi_affine_iv { gphi *phi; affine_iv iv; }; ... and rewritten the transformation part to iterate over the vector. Bootstrapped and reg-tested on x86_64. Thanks, - Tom Split canonicalize_loop_ivs in analyze and rewrite part 2018-03-21 Tom de Vries * tree-ssa-loop-manip.c (rewrite_phi_with_iv): Remove loop parameter. Add iv parameter. Remove early-out tests. (struct phi_affine_iv): New struct. (get_all_phi_affine_ivs): New function. Factor out of ... (rewrite_all_phi_nodes_with_iv): ... this. Remove loop argument. Add simple_ivs argument. Rewrite to iterate over simple_ivs. (canonicalize_loop_ivs_1): Factor out of ... (canonicalize_loop_ivs): ... this. Call get_all_phi_affine_ivs and pass result as argument to canonicalize_loop_ivs_1. --- gcc/tree-ssa-loop-manip.c | 114 ++ 1 file changed, 76 insertions(+), 38 deletions(-) diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c index bf425af..ecda325 100644 --- a/gcc/tree-ssa-loop-manip.c +++ b/gcc/tree-ssa-loop-manip.c @@ -1430,79 +1430,99 @@ tree_unroll_loop (struct loop *loop, unsigned factor, induction variable MAIN_IV and insert the generated code at GSI. */ static void -rewrite_phi_with_iv (loop_p loop, - gphi_iterator *psi, +rewrite_phi_with_iv (gphi_iterator *psi, gimple_stmt_iterator *gsi, - tree main_iv) + tree main_iv, affine_iv *iv) { - affine_iv iv; gassign *stmt; gphi *phi = psi->phi (); tree atype, mtype, val, res = PHI_RESULT (phi); - if (virtual_operand_p (res) || res == main_iv) -{ - gsi_next (psi); - return; -} - - if (!simple_iv (loop, loop, res, &iv, true)) -{ - gsi_next (psi); - return; -} - remove_phi_node (psi, false); atype = TREE_TYPE (res); mtype = POINTER_TYPE_P (atype) ? sizetype : atype; - val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv.step), + val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv->step), fold_convert (mtype, main_iv)); val = fold_build2 (POINTER_TYPE_P (atype) ? POINTER_PLUS_EXPR : PLUS_EXPR, - atype, unshare_expr (iv.base), val); + atype, unshare_expr (iv->base), val); val = force_gimple_operand_gsi (gsi, val, false, NULL_TREE, true, GSI_SAME_STMT); stmt = gimple_build_assign (res, val); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); } -/* Rewrite all the phi nodes of LOOP in function of the main induction - variable MAIN_IV. */ +struct phi_affine_iv +{ + gphi *phi; + affine_iv iv; +}; + +/* Return map of phi node result to affine_iv, for all phis in LOOPS. */ static void -rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv) +get_all_phi_affine_ivs (struct loop *loop, + auto_vec *simple_ivs) { - unsigned i; basic_block *bbs = get_loop_body_in_dom_order (loop); - gphi_iterator psi; - for (i = 0; i < loop->num_nodes; i++) + for (unsigned i = 0; i < loop->num_nodes; i++) { basic_bl
Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs
On Wed, 21 Mar 2018, Tom de Vries wrote: > On 03/12/2018 01:14 PM, Richard Biener wrote: > > On Thu, 22 Feb 2018, Tom de Vries wrote: > > > > > Hi, > > > > > > this patch fixes an ICE in the parloops pass. > > > > > > The ICE (when compiling the test-case in attached patch) follows from the > > > fact > > > that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to > > > "base all the induction variables in LOOP on a single control one": > > > ... > > >/* Base all the induction variables in LOOP on a single control one.*/ > > >canonicalize_loop_ivs (loop, &nit, true); > > > ... > > > > > > This is caused by the fact that for one of the induction variables, > > > simple_iv > > > no longer returns true (as was the case at the start of > > > gen_parallel_loop). > > > This seems to be triggered by the fact that during loop_version we call > > > scev_reset (although I'm not sure why when recalculating scev info we're > > > not > > > reaching the same conclusion as before). > > I guess the real reason is that canonicalize_loop_ivs calls > > create_iv first which in the parloop case with bump-in-latch true > > and then incrementally re-writes PHIs, eventually wrecking other > > PHIs SCEV. In this case the 2nd PHIs evolution depends on the > > first one but the rewritten ones depend on the new IV already. > > > > So the better fix (maybe not 100% enough) would be to re-organize > > canonicalize_loop_ivs to first do analysis of all PHIs and then > > rewrite them all. > > > > Focusing on the re-organize first, is this sort of what you had in mind? Yes, though not sure why you need to have a hash-map here, just pushing to a vector of PHIs would work, no? Or alternatively a bitmap of SSA versions so you have a way to match the gphi iterator with the set of IVs. But the vector should be sorted in PHI order so just traversing the PHIs looking for the "next" PHI in the vector would work I guess. Does it help with the original issue btw? Richard. > Bootstrapped and reg-tested on x86_64. > > Thanks, > - Tom > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs
On Wed, Mar 21, 2018 at 3:06 PM, Tom de Vries wrote: > On 03/12/2018 01:14 PM, Richard Biener wrote: >> >> On Thu, 22 Feb 2018, Tom de Vries wrote: >> >>> Hi, >>> >>> this patch fixes an ICE in the parloops pass. >>> >>> The ICE (when compiling the test-case in attached patch) follows from the >>> fact >>> that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to >>> "base all the induction variables in LOOP on a single control one": >>> ... >>>/* Base all the induction variables in LOOP on a single control one.*/ >>>canonicalize_loop_ivs (loop, &nit, true); >>> ... >>> >>> This is caused by the fact that for one of the induction variables, >>> simple_iv >>> no longer returns true (as was the case at the start of >>> gen_parallel_loop). >>> This seems to be triggered by the fact that during loop_version we call >>> scev_reset (although I'm not sure why when recalculating scev info we're >>> not >>> reaching the same conclusion as before). >> >> I guess the real reason is that canonicalize_loop_ivs calls >> create_iv first which in the parloop case with bump-in-latch true >> and then incrementally re-writes PHIs, eventually wrecking other >> PHIs SCEV. In this case the 2nd PHIs evolution depends on the >> first one but the rewritten ones depend on the new IV already. >> >> So the better fix (maybe not 100% enough) would be to re-organize >> canonicalize_loop_ivs to first do analysis of all PHIs and then >> rewrite them all. >> > > Focusing on the re-organize first, is this sort of what you had in mind? FYI, we have multiple interfaces manipulating on IVs, sometimes in a similar way. An example would be create_canonical_iv in tree-ssa-loop-ivcanon.c. One proposal for discussion is to refactor such interfaces and put them in a single source file, like tree-ssa-loop-ivopts. For example of create_canonical_iv, it can be removed by simply adding the corresponding candidate in IVOPTs. Thanks, bin > > Bootstrapped and reg-tested on x86_64. > > Thanks, > - Tom
Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs
On 03/12/2018 01:14 PM, Richard Biener wrote: On Thu, 22 Feb 2018, Tom de Vries wrote: Hi, this patch fixes an ICE in the parloops pass. The ICE (when compiling the test-case in attached patch) follows from the fact that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to "base all the induction variables in LOOP on a single control one": ... /* Base all the induction variables in LOOP on a single control one.*/ canonicalize_loop_ivs (loop, &nit, true); ... This is caused by the fact that for one of the induction variables, simple_iv no longer returns true (as was the case at the start of gen_parallel_loop). This seems to be triggered by the fact that during loop_version we call scev_reset (although I'm not sure why when recalculating scev info we're not reaching the same conclusion as before). I guess the real reason is that canonicalize_loop_ivs calls create_iv first which in the parloop case with bump-in-latch true and then incrementally re-writes PHIs, eventually wrecking other PHIs SCEV. In this case the 2nd PHIs evolution depends on the first one but the rewritten ones depend on the new IV already. So the better fix (maybe not 100% enough) would be to re-organize canonicalize_loop_ivs to first do analysis of all PHIs and then rewrite them all. Focusing on the re-organize first, is this sort of what you had in mind? Bootstrapped and reg-tested on x86_64. Thanks, - Tom Split canonicalize_loop_ivs in analyze and rewrite parts 2018-03-21 Tom de Vries * tree-ssa-loop-manip.c (rewrite_phi_with_iv): Remove loop parameter. Add iv parameter. Move early-out tests ... (rewrite_all_phi_nodes_with_iv): ... here. Remove loop argument and add iv argument to rewrite_phi_with_iv call. (get_all_phi_affine_ivs): New function. (canonicalize_loop_ivs): Call get_all_phi_affine_ivs and pass result as argument to rewrite_all_phi_nodes_with_iv. --- gcc/tree-ssa-loop-manip.c | 82 +++ 1 file changed, 61 insertions(+), 21 deletions(-) diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c index bf425af..15a5795 100644 --- a/gcc/tree-ssa-loop-manip.c +++ b/gcc/tree-ssa-loop-manip.c @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-scalar-evolution.h" #include "params.h" #include "tree-inline.h" +#include "hash-map.h" /* All bitmaps for rewriting into loop-closed SSA go on this obstack, so that we can free them all at once. */ @@ -1430,48 +1431,65 @@ tree_unroll_loop (struct loop *loop, unsigned factor, induction variable MAIN_IV and insert the generated code at GSI. */ static void -rewrite_phi_with_iv (loop_p loop, - gphi_iterator *psi, +rewrite_phi_with_iv (gphi_iterator *psi, gimple_stmt_iterator *gsi, - tree main_iv) + tree main_iv, affine_iv *iv) { - affine_iv iv; gassign *stmt; gphi *phi = psi->phi (); tree atype, mtype, val, res = PHI_RESULT (phi); - if (virtual_operand_p (res) || res == main_iv) -{ - gsi_next (psi); - return; -} - - if (!simple_iv (loop, loop, res, &iv, true)) -{ - gsi_next (psi); - return; -} - remove_phi_node (psi, false); atype = TREE_TYPE (res); mtype = POINTER_TYPE_P (atype) ? sizetype : atype; - val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv.step), + val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv->step), fold_convert (mtype, main_iv)); val = fold_build2 (POINTER_TYPE_P (atype) ? POINTER_PLUS_EXPR : PLUS_EXPR, - atype, unshare_expr (iv.base), val); + atype, unshare_expr (iv->base), val); val = force_gimple_operand_gsi (gsi, val, false, NULL_TREE, true, GSI_SAME_STMT); stmt = gimple_build_assign (res, val); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); } +/* Return map of phi node result to affine_iv, for all phis in LOOPS. */ + +static void +get_all_phi_affine_ivs (loop_p loop, hash_map *simple_ivs) +{ + unsigned i; + basic_block *bbs = get_loop_body_in_dom_order (loop); + gphi_iterator psi; + + for (i = 0; i < loop->num_nodes; i++) +{ + basic_block bb = bbs[i]; + + if (bb->loop_father != loop) + continue; + + for (psi = gsi_start_phis (bb); !gsi_end_p (psi); + gsi_next (&psi)) + { + gphi *phi = psi.phi (); + affine_iv iv; + tree res = PHI_RESULT (phi); + if (simple_iv (loop, loop, res, &iv, true)) + simple_ivs->put (res, iv); + } +} + + free (bbs); +} + /* Rewrite all the phi nodes of LOOP in function of the main induction variable MAIN_IV. */ static void -rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv) +rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv, + hash_map *simple_ivs) { unsigned i; basic_block *bbs = get_loop_body_in_dom_order (loop); @@ -1486,7 +1504,25 @@ rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv) continue; for (psi = gsi_start_phis (bb); !gsi_end_p (psi); ) - rewr
Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs
On Wed, 21 Mar 2018, Tom de Vries wrote: > On 03/12/2018 01:14 PM, Richard Biener wrote: > > On Thu, 22 Feb 2018, Tom de Vries wrote: > > > I can > > > rework the bit of the patch that adds an assert after > > > canonicalize_loop_ivs > > > into a patch that aborts the optimization instead of ICE-ing. > > > > I think that's something reasonable anyway. > > > > Patch attached below implements that approach. > > The difference between before parloops, and after ompexpssa2 is this (showing > the effects of canonicalize_loop_ivs): > ... > $ diff -u pr83126.c.156t.dce5 pr83126.c.158t.ompexpssa2 >... > ew (short unsigned int c9, int stuff) > { >int * fd; > @@ -9,53 +103,53 @@ >int _3; >short unsigned int _4; >unsigned int _5; > + unsigned int ivtmp_8; >unsigned int _22; >unsigned int ivtmp_24; >unsigned int ivtmp_26; > + unsigned int ivtmp_28; > > [local count: 11811]: > > [local count: 118111601]: > - # c9_1 = PHI > + # c9_1 = PHI >_2 = (long int) c9_1; >fd_13 = (int *) _2; >_3 = *fd_13; > > [local count: 1073741825]: >if (_3 != 0) > -goto ; [11.00%] > +goto ; [11.00%] >else > goto ; [89.00%] > > [local count: 955630225]: >goto ; [100.00%] > > - [local count: 118111600]: > + [local count: 94489281]: > > - [local count: 118111600]: > - > - [local count: 236258638]: > - # _22 = PHI <0(10), _5(8)> > - # c9_23 = PHI > - # e1_27 = PHI <0(10), e1_17(8)> > - # ivtmp_26 = PHI <2(10), ivtmp_24(8)> > + [local count: 189006911]: > + # c9_23 = PHI > + # e1_27 = PHI > + # ivtmp_8 = PHI > + _22 = ivtmp_8; > + ivtmp_26 = 2 - ivtmp_8; >_4 = (short unsigned int) e1_27; >c9_14 = _4 * c9_23; >_5 = _22 + 1; >e1_17 = (int) _5; >ivtmp_24 = ivtmp_26 - 1; > - if (ivtmp_24 != 0) > + if (ivtmp_8 < 1) > goto ; [66.67%] >else > goto ; [33.33%] > > - [local count: 157513634]: > + [local count: 126010908]: > + ivtmp_28 = ivtmp_8 + 1; >goto ; [100.00%] > > [local count: 78745004]: ># c9_21 = PHI > - > - [local count: 78745004]: >goto ; [100.00%] > > } > ... > > > Bootstrapped and reg-tested on x86_64. > > OK for stage4 trunk? OK. Thanks, Richard.
Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs
On 03/12/2018 01:14 PM, Richard Biener wrote: On Thu, 22 Feb 2018, Tom de Vries wrote: I can rework the bit of the patch that adds an assert after canonicalize_loop_ivs into a patch that aborts the optimization instead of ICE-ing. I think that's something reasonable anyway. Patch attached below implements that approach. The difference between before parloops, and after ompexpssa2 is this (showing the effects of canonicalize_loop_ivs): ... $ diff -u pr83126.c.156t.dce5 pr83126.c.158t.ompexpssa2 ... ew (short unsigned int c9, int stuff) { int * fd; @@ -9,53 +103,53 @@ int _3; short unsigned int _4; unsigned int _5; + unsigned int ivtmp_8; unsigned int _22; unsigned int ivtmp_24; unsigned int ivtmp_26; + unsigned int ivtmp_28; [local count: 11811]: [local count: 118111601]: - # c9_1 = PHI + # c9_1 = PHI _2 = (long int) c9_1; fd_13 = (int *) _2; _3 = *fd_13; [local count: 1073741825]: if (_3 != 0) -goto ; [11.00%] +goto ; [11.00%] else goto ; [89.00%] [local count: 955630225]: goto ; [100.00%] - [local count: 118111600]: + [local count: 94489281]: - [local count: 118111600]: - - [local count: 236258638]: - # _22 = PHI <0(10), _5(8)> - # c9_23 = PHI - # e1_27 = PHI <0(10), e1_17(8)> - # ivtmp_26 = PHI <2(10), ivtmp_24(8)> + [local count: 189006911]: + # c9_23 = PHI + # e1_27 = PHI + # ivtmp_8 = PHI + _22 = ivtmp_8; + ivtmp_26 = 2 - ivtmp_8; _4 = (short unsigned int) e1_27; c9_14 = _4 * c9_23; _5 = _22 + 1; e1_17 = (int) _5; ivtmp_24 = ivtmp_26 - 1; - if (ivtmp_24 != 0) + if (ivtmp_8 < 1) goto ; [66.67%] else goto ; [33.33%] - [local count: 157513634]: + [local count: 126010908]: + ivtmp_28 = ivtmp_8 + 1; goto ; [100.00%] [local count: 78745004]: # c9_21 = PHI - - [local count: 78745004]: goto ; [100.00%] } ... Bootstrapped and reg-tested on x86_64. OK for stage4 trunk? Thanks, - Tom [parloops] Handle canonicalize_loop_ivs failure 2018-03-19 Tom de Vries PR tree-optimization/83126 * tree-parloops.c (num_phis): New function. (gen_parallel_loop): Detect and handle canonicalize_loop_ivs failure. * gcc.dg/graphite/pr83126.c: New test. --- gcc/testsuite/gcc.dg/graphite/pr83126.c | 19 gcc/tree-parloops.c | 39 + 2 files changed, 58 insertions(+) diff --git a/gcc/testsuite/gcc.dg/graphite/pr83126.c b/gcc/testsuite/gcc.dg/graphite/pr83126.c new file mode 100644 index 000..663d059 --- /dev/null +++ b/gcc/testsuite/gcc.dg/graphite/pr83126.c @@ -0,0 +1,19 @@ +/* { dg-additional-options "-w -ftree-parallelize-loops=2 -floop-parallelize-all -O1" } */ + +void +ew (unsigned short int c9, int stuff) +{ + int e1; + + for (;;) +{ + unsigned int *by = &e1; + int *fd = &stuff; + + *fd = c9; + fd = *fd; + if (*fd != 0) + for (*by = 0; *by < 2; ++*by) + c9 *= e1; +} +} diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index e44ad5e..3a788cc 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2235,6 +2235,25 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, calculate_dominance_info (CDI_DOMINATORS); } +/* Return number of phis in bb. If COUNT_VIRTUAL_P is false, don't count the + virtual phi. */ + +static unsigned int +num_phis (basic_block bb, bool count_virtual_p) +{ + unsigned int nr_phis = 0; + gphi_iterator gsi; + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) +{ + if (!count_virtual_p && virtual_operand_p (PHI_RESULT (gsi.phi ( + continue; + + nr_phis++; +} + + return nr_phis; +} + /* Generates code to execute the iterations of LOOP in N_THREADS threads in parallel, which can be 0 if that number is to be determined later. @@ -2370,6 +2389,26 @@ gen_parallel_loop (struct loop *loop, /* Base all the induction variables in LOOP on a single control one. */ canonicalize_loop_ivs (loop, &nit, true); + if (num_phis (loop->header, false) != reduction_list->elements () + 1) +{ + /* The call to canonicalize_loop_ivs above failed to "base all the + induction variables in LOOP on a single control one". Do damage + control. */ + basic_block preheader = loop_preheader_edge (loop)->src; + basic_block cond_bb = single_pred (preheader); + gcond *cond = as_a (gsi_stmt (gsi_last_bb (cond_bb))); + gimple_cond_make_true (cond); + update_stmt (cond); + /* We've gotten rid of the duplicate loop created by loop_version, but + we can't undo whatever canonicalize_loop_ivs has done. + TODO: Fix this properly by ensuring that the call to + canonicalize_loop_ivs succeeds. */ + if (dump_file + && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "canonicalize_loop_ivs failed for loop %d," + " aborting transformation\n", loop->num); + return; +} /* Ensure that the exit condition is th
Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs
On Thu, 22 Feb 2018, Tom de Vries wrote: > Hi, > > this patch fixes an ICE in the parloops pass. > > The ICE (when compiling the test-case in attached patch) follows from the fact > that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to > "base all the induction variables in LOOP on a single control one": > ... > /* Base all the induction variables in LOOP on a single control one.*/ > canonicalize_loop_ivs (loop, &nit, true); > ... > > This is caused by the fact that for one of the induction variables, simple_iv > no longer returns true (as was the case at the start of gen_parallel_loop). > This seems to be triggered by the fact that during loop_version we call > scev_reset (although I'm not sure why when recalculating scev info we're not > reaching the same conclusion as before). I guess the real reason is that canonicalize_loop_ivs calls create_iv first which in the parloop case with bump-in-latch true and then incrementally re-writes PHIs, eventually wrecking other PHIs SCEV. In this case the 2nd PHIs evolution depends on the first one but the rewritten ones depend on the new IV already. So the better fix (maybe not 100% enough) would be to re-organize canonicalize_loop_ivs to first do analysis of all PHIs and then rewrite them all. > > The patch fixes the ICE by caching the affine_ivs as calculated by simple_iv > before calling loop_version, and reusing those in canonicalize_loop_ivs > (instead of calling simple_iv again). > > Bootstrapped and reg-tested on x86_64. > > OK for stage1? First of all please to not use std::map but instead our own hash_map, the proper way of including would be to define INCLUDE_MAP before the system.h include. But see above. > I'm not sure if this is minimal/low-impact enough for stage4. If not, I can > rework the bit of the patch that adds an assert after canonicalize_loop_ivs > into a patch that aborts the optimization instead of ICE-ing. I think that's something reasonable anyway. Thanks, Richard. > Thanks, > - Tom > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
[parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs
Hi, this patch fixes an ICE in the parloops pass. The ICE (when compiling the test-case in attached patch) follows from the fact that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to "base all the induction variables in LOOP on a single control one": ... /* Base all the induction variables in LOOP on a single control one.*/ canonicalize_loop_ivs (loop, &nit, true); ... This is caused by the fact that for one of the induction variables, simple_iv no longer returns true (as was the case at the start of gen_parallel_loop). This seems to be triggered by the fact that during loop_version we call scev_reset (although I'm not sure why when recalculating scev info we're not reaching the same conclusion as before). The patch fixes the ICE by caching the affine_ivs as calculated by simple_iv before calling loop_version, and reusing those in canonicalize_loop_ivs (instead of calling simple_iv again). Bootstrapped and reg-tested on x86_64. OK for stage1? I'm not sure if this is minimal/low-impact enough for stage4. If not, I can rework the bit of the patch that adds an assert after canonicalize_loop_ivs into a patch that aborts the optimization instead of ICE-ing. Thanks, - Tom [parloops] Use cached affine_ivs canonicalize_loop_ivs 2018-02-22 Tom de Vries PR tree-optimization/83126 * tree-ssa-loop-manip.c (rewrite_phi_with_iv): Add and handle piv parameter. (rewrite_all_phi_nodes_with_iv, canonicalize_loop_ivs): Add and handle simple_ivs parameter. * tree-ssa-loop-manip.h (canonicalize_loop_ivs): Declare simple_ivs parameter and default to NULL. * tree-parloops.c (num_phis): New function. (gen_parallel_loop): Add simple_ivs argument to canonicalize_loop_ivs call. --- gcc/testsuite/gcc.dg/graphite/pr83126.c | 19 +++ gcc/tree-parloops.c | 43 +++-- gcc/tree-ssa-loop-manip.c | 41 --- gcc/tree-ssa-loop-manip.h | 6 - 4 files changed, 97 insertions(+), 12 deletions(-) diff --git a/gcc/testsuite/gcc.dg/graphite/pr83126.c b/gcc/testsuite/gcc.dg/graphite/pr83126.c new file mode 100644 index 000..663d059 --- /dev/null +++ b/gcc/testsuite/gcc.dg/graphite/pr83126.c @@ -0,0 +1,19 @@ +/* { dg-additional-options "-w -ftree-parallelize-loops=2 -floop-parallelize-all -O1" } */ + +void +ew (unsigned short int c9, int stuff) +{ + int e1; + + for (;;) +{ + unsigned int *by = &e1; + int *fd = &stuff; + + *fd = c9; + fd = *fd; + if (*fd != 0) + for (*by = 0; *by < 2; ++*by) + c9 *= e1; +} +} diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index e44ad5e..5971680 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2235,6 +2235,25 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, calculate_dominance_info (CDI_DOMINATORS); } +/* Return number of phis in bb. If COUNT_VIRTUAL_P is false, don't count the + virtual phi. */ + +static unsigned int +num_phis (basic_block bb, bool count_virtual_p) +{ + unsigned int nr_phis = 0; + gphi_iterator gsi; + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) +{ + if (!count_virtual_p && virtual_operand_p (PHI_RESULT (gsi.phi ( + continue; + + nr_phis++; +} + + return nr_phis; +} + /* Generates code to execute the iterations of LOOP in N_THREADS threads in parallel, which can be 0 if that number is to be determined later. @@ -2326,6 +2345,24 @@ gen_parallel_loop (struct loop *loop, if (stmts) gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts); + /* The loop_version call below calls gimple_duplicate_loop_to_header_edge, + which calls scev_reset. So a loop header phi node that classifies as + simple_iv here, may not classify as such anymore after loop_version, which + will make canonicalize_loop_ivs fail to "base all the induction variables + in LOOP on a single control one". We fix this by storing the affine_iv + result of the simple_iv call in a map from phi node result to affine_iv, + and passing that map as an argument to canonicalize_loop_ivs. */ + std::map simple_ivs; + gphi_iterator psi; + for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next (&psi)) +{ + gphi *phi = psi.phi (); + affine_iv iv; + tree res = PHI_RESULT (phi); + if (simple_iv (loop, loop, res, &iv, true)) + simple_ivs.insert (std::pair (res, iv)); +} + if (!oacc_kernels_p) { if (loop->inner) @@ -2364,12 +2401,14 @@ gen_parallel_loop (struct loop *loop, profile_probability::unlikely (), profile_probability::likely (), profile_probability::unlikely (), true); - update_ssa (TODO_update_ssa); free_original_copy_tables (); } /* Base all the induction variables in LOOP on a single control one. */ - canonicalize_loop_ivs (loop, &nit, true); + canonicalize_loop_iv