Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
>> Suppose a loop as: >> >> void f (std::map m) >> { >> for (auto it = m.begin (); it != m.end (); ++it) { >> /* if (b) is semi-invariant. */ >> if (b) { >> b = do_something();/* Has effect on b */ >> } else { >> /* No effect on b */ >> } >> statements; /* Also no effect on b */ >> } >> } >> >> A transformation, kind of loop split, could be: >> >> void f (std::map m) >> { >> for (auto it = m.begin (); it != m.end (); ++it) { >> if (b) { >> b = do_something(); >> } else { >> ++it; >> statements; >> break; >> } >> statements; >> } >> >> for (; it != m.end (); ++it) { >> statements; >> } >> } > So if you consider approaching this from unswitching instead we'd > unswitch it on if (b) but > treat the condition as constant only in the 'false' path, thus the > transformed code would > look like the following. I believe implementing this in the existing > unswitching pass > involves a lot less code than putting it into the splitting pass but > it would catch > exactly the same cases? May not. Firstly, the following transformation is legal only when "b" is semi-invariant, which means once a branch of "if (b)" is selected since certain iteration, execution will always go to that branch in all following iterations. Most of code in this loop-split patch was composed to check semi-invariantness of a conditional expression, which must also be needed in loop-unswitch solution. Secondly, to duplicate/move an invariant expression out of loop is simple, for all intermediate computations should already be outside, only need to handle its result SSA that is inside the loop. But for semi-invariant, we have to duplicate a tree of statements that contributes to computation of the condition expression, similar to code hoisting, which make us write extra code. And loop-unswitch solution is weaker than loop-split. Suppose initial value of "b" is true, and is changed to false after some iterations, not only unswitch does not help that, but also it introduces extra cost due to enclosing "if (b)". > if (b) >{ > for (auto it = m.begin (); it != m.end (); ++it) { > /* if (b) is non-invariant. */ > if (b) { > b = do_something();/* Has effect on b */ > } else { > /* No effect on b */ > } > statements; /* Also no effect on b */ > } > } > else > { > for (auto it = m.begin (); it != m.end (); ++it) { > /* if (b) is invariant. */ > if (false) { > b = do_something();/* Has effect on b */ > } else { > /* No effect on b */ > } > statements; /* Also no effect on b */ > } > }
Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
On Mon, Jul 15, 2019 at 4:20 AM Feng Xue OS wrote: > > Some time passed, so ping again. I made this patch, because it can reward us > with 7% > > performance benefit in some real application. For convenience, the > optimization to be > > implemented was listed in the following again. And hope your comments on the > patch, or > > design suggestions. Thanks! Replying again to the very first post since it contains the figure below. > Suppose a loop as: > > void f (std::map m) > { > for (auto it = m.begin (); it != m.end (); ++it) { > /* if (b) is semi-invariant. */ > if (b) { > b = do_something();/* Has effect on b */ > } else { > /* No effect on b */ > } > statements; /* Also no effect on b */ > } > } > > A transformation, kind of loop split, could be: > > void f (std::map m) > { > for (auto it = m.begin (); it != m.end (); ++it) { > if (b) { > b = do_something(); > } else { > ++it; > statements; > break; > } > statements; > } > > for (; it != m.end (); ++it) { > statements; > } > } So if you consider approaching this from unswitching instead we'd unswitch it on if (b) but treat the condition as constant only in the 'false' path, thus the transformed code would look like the following. I believe implementing this in the existing unswitching pass involves a lot less code than putting it into the splitting pass but it would catch exactly the same cases? if (b) { for (auto it = m.begin (); it != m.end (); ++it) { /* if (b) is non-invariant. */ if (b) { b = do_something();/* Has effect on b */ } else { /* No effect on b */ } statements; /* Also no effect on b */ } } else { for (auto it = m.begin (); it != m.end (); ++it) { /* if (b) is invariant. */ if (false) { b = do_something();/* Has effect on b */ } else { /* No effect on b */ } statements; /* Also no effect on b */ } } > If "statements" contains nothing, the second loop becomes an empty one, which > can be removed. > And if "statements" are straight line instructions, we get an opportunity to > vectorize the second loop. > > Feng > > > > From: Feng Xue OS > Sent: Tuesday, June 18, 2019 3:00 PM > To: Richard Biener; Michael Matz > Cc: gcc-patches@gcc.gnu.org > Subject: Ping: [PATCH V2] Loop split upon semi-invariant condition (PR > tree-optimization/89134) > > Richard & Michael, > >I made some adjustments on coding style and added test cases for this > version. > >Would you please take a look at the patch? It is long a little bit and > might steal some >of your time. > > Thanks a lot. > > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 9a46f93d89d..2334b184945 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,23 @@ > +2019-06-18 Feng Xue > + > + PR tree-optimization/89134 > + * doc/invoke.texi (max-cond-loop-split-insns): Document new --params. > + (min-cond-loop-split-prob): Likewise. > + * params.def: Add max-cond-loop-split-insns, min-cond-loop-split-prob. > + * passes.def (pass_cond_loop_split) : New pass. > + * timevar.def (TV_COND_LOOP_SPLIT): New time variable. > + * tree-pass.h (make_pass_cond_loop_split): New declaration. > + * tree-ssa-loop-split.c (split_info): New class. > + (find_vdef_in_loop, vuse_semi_invariant_p): New functions. > + (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise. > + (branch_removable_p, get_cond_invariant_branch): Likewise. > + (is_cond_in_hidden_loop, compute_added_num_insns): Likewise. > + (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise. > + (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise. > + (pass_data_cond_loop_split): New variable. > + (pass_cond_loop_split): New class. > + (make_pass_cond_loop_split): New function. > + > 2019-06-18 Kewen Lin > > PR middle-end/80791 > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index eaef4cd63d2..0427fede3d6 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a > single loop. > @item lim-expensive > The minimum cost of an expensive expression in the loop invariant motion. > > +@item max-cond-loop-split-insns > +The maximum number of insns
Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
Hi, Michael, Since I was involved in other tasks, it is a little bit late to reply you. Sorry for that. I composed a new one with your suggestions. Please review that when you are in convenience. > Generally I do like the idea of the transformation, and the basic building > blocks seem to be sound. But I dislike it being a separate pass, so > please integrate the code you have written into the existing loop split > pass. Most building blocks can be used as is, except the main driver. This new transformation was integrated into the pass of original loop split. >> +@item max-cond-loop-split-insns >> +The maximum number of insns to be increased due to loop split on >> +semi-invariant condition statement. > "to be increased" --> "to be generated" (or "added") Done. >> +@item min-cond-loop-split-prob >> +The minimum threshold for probability of semi-invaraint condition >> +statement to trigger loop split. > typo, semi-invariant Done. > I think somewhere in the docs your definition of semi-invariant needs > to be stated in some form (can be short, doesn't need to reproduce the > diagram or such), so don't just replicate the short info from the > params.def file. Done. >> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB, >> + "min-cond-loop-split-prob", >> + "The minimum threshold for probability of semi-invaraint condition " >> + "statement to trigger loop split.", > Same typo: "semi-invariant". Done. >> -/* This file implements loop splitting, i.e. transformation of loops like >> +/* This file implements two kind of loop splitting. > kind_s_, plural Done. >> +/* Another transformation of loops like: >> + >> + for (i = INIT (); CHECK (i); i = NEXT ()) >> + { >> + if (expr (a_1, a_2, ..., a_n)) >> + a_j = ...; // change at least one a_j >> + else >> + S; // not change any a_j >> + } > You should mention that 'expr' needs to be pure, i.e. once it > becomes false and the inputs don't change, that it remains false. Done. >> +static bool >> +branch_removable_p (basic_block branch_bb) >> +{ >> + if (single_pred_p (branch_bb)) >> +return true; >> + >> + edge e; >> + edge_iterator ei; >> + >> + FOR_EACH_EDGE (e, ei, branch_bb->preds) >> +{ >> + if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb)) >> + continue; >> + >> + if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src)) >> + continue; > My gut feeling is surprised by this. So one of the predecessors of > branch_bb dominates it. Why should that indicate that branch_bb > can be safely removed? > > Think about something like this: > > esrc --> cond_bb --> branch_bb > '---^ If all predecessors of branch_bb dominate it, these predecessors should also be in dominating relationship among them, and the conditional statement must be branch_bb's immediate dominator, and branch_bb is removable. In your example. For "esrc", loop is continued, nothing is impacted. But in the next iteration, we encounter "cond_bb", it does not dominate "branch_bb", so the function return false in the following return statement. > (cond_bb is the callers bb of the cond statement in question). Now esrc > dominates branch_bb but still you can't simply remove it, even if > the cond_bb->branch_bb edge becomes unexecutable. >> +static int >> +get_cond_invariant_branch (struct loop *loop, gcond *cond) > Please return an edge here, not an edge index (which removes the using of > -1). I think the name (and comment) should consistently talk about > semi-invariant, not invariant. For when you really need an edge index > later, you could use "EDGE_SUCC(bb, 0) != edge". But you probably don't > really need it, e.g. instead of using the gimple pass-local-flag on a > statement you can just as well also store the edge in your info structure. Done. >> +static bool >> +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb, >> + int branch) > With above change in get_cond_invariant_branch, this also should > take an edge, not a bb+edge-index. Done. >> +static int >> +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb, >> +int branch) > This should take an edge as well. Done. >> + for (unsigned i = 0; i < loop->num_nodes; i++) >> +{ >> + /* Do no count basic blocks only in opposite branch. */ >> + if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var)) >> + continue; >> + >> + for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p >> (gsi); >> + gsi_next (&gsi)) >> + num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights); > Replace the loop by > estimate_num_insn_seq (bb_seq (bbs[i]), &eni_size_weights); Done. >> + /* Add a threshold for increased code size to disable loop split. */ >> + if (compute_added_num_insns (loop, cond_bb, branch) > >> + PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS)) > Operator should
Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
Thanks for these comments. Feng From: Michael Matz Sent: Tuesday, July 30, 2019 1:59:04 AM To: Feng Xue OS Cc: Richard Biener; gcc-patches@gcc.gnu.org Subject: Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134) Hello Feng, first, sorry for the terrible delay in reviewing, but here is one now :) Generally I do like the idea of the transformation, and the basic building blocks seem to be sound. But I dislike it being a separate pass, so please integrate the code you have written into the existing loop split pass. Most building blocks can be used as is, except the main driver. The existing loop-split code uses loop->aux as binary marker and analyses and transforms loops in one go, you're doing it separately. That separation makes sense for you, so the existing code should be changed to also do that separately. Some info for the existing loop-split analysis needs to be stored in the info struct then as well, which can be done if you add some fields in yours. Some splitting-out of code from the existing main driver is probably needed (basically the parts that determine eligibility and which cond statement to split). The two routines that actually split the loops (those that call loop_version) also have an overlap, so maybe something more can be commonized between the two (ultimately the way of splitting in both variants is different, so somewhere they'll do something else, but still some parts are common). So, with these general remarks, some more concrete ones about the patch: > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index eaef4cd63d2..0427fede3d6 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a > single loop. > @item lim-expensive > The minimum cost of an expensive expression in the loop invariant motion. > > +@item max-cond-loop-split-insns > +The maximum number of insns to be increased due to loop split on > +semi-invariant condition statement. "to be increased" --> "to be generated" (or "added") > +@item min-cond-loop-split-prob > +The minimum threshold for probability of semi-invaraint condition > +statement to trigger loop split. typo, semi-invariant I think somewhere in the docs your definition of semi-invariant needs to be stated in some form (can be short, doesn't need to reproduce the diagram or such), so don't just replicate the short info from the params.def file. > diff --git a/gcc/params.def b/gcc/params.def > index 0db60951413..5384f7d1c4d 100644 > --- a/gcc/params.def > +++ b/gcc/params.def > @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL, > "The maximum number of unswitchings in a single loop.", > 3, 0, 0) > > +/* The maximum number of increased insns due to loop split on semi-invariant > + condition statement. */ > +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS, > + "max-cond-loop-split-insns", > + "The maximum number of insns to be increased due to loop split on " > + "semi-invariant condition statement.", > + 100, 0, 0) > + > +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB, > + "min-cond-loop-split-prob", > + "The minimum threshold for probability of semi-invaraint condition " > + "statement to trigger loop split.", Same typo: "semi-invariant". > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c > index 999c9a30366..7239d0cfb00 100644 > --- a/gcc/tree-ssa-loop-split.c > +++ b/gcc/tree-ssa-loop-split.c > -/* This file implements loop splitting, i.e. transformation of loops like > +/* This file implements two kind of loop splitting. kind_s_, plural > + One transformation of loops like: > > for (i = 0; i < 100; i++) > { > @@ -670,6 +674,782 @@ tree_ssa_split_loops (void) >return 0; > } > > + > +/* Another transformation of loops like: > + > + for (i = INIT (); CHECK (i); i = NEXT ()) > + { > + if (expr (a_1, a_2, ..., a_n)) > + a_j = ...; // change at least one a_j > + else > + S; // not change any a_j > + } You should mention that 'expr' needs to be pure, i.e. once it becomes false and the inputs don't change, that it remains false. > + > + into: > + > + for (i = INIT (); CHECK (i); i = NEXT ()) > + { > + if (expr (a_1, a_2, ..., a_n)) > + a_j = ...; > + else > + { > + S; > + i = NEXT (); > + break; > + } > + } > + > + for (; CHECK (i); i = NEXT ()) > + { > +
Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
Hello Feng, first, sorry for the terrible delay in reviewing, but here is one now :) Generally I do like the idea of the transformation, and the basic building blocks seem to be sound. But I dislike it being a separate pass, so please integrate the code you have written into the existing loop split pass. Most building blocks can be used as is, except the main driver. The existing loop-split code uses loop->aux as binary marker and analyses and transforms loops in one go, you're doing it separately. That separation makes sense for you, so the existing code should be changed to also do that separately. Some info for the existing loop-split analysis needs to be stored in the info struct then as well, which can be done if you add some fields in yours. Some splitting-out of code from the existing main driver is probably needed (basically the parts that determine eligibility and which cond statement to split). The two routines that actually split the loops (those that call loop_version) also have an overlap, so maybe something more can be commonized between the two (ultimately the way of splitting in both variants is different, so somewhere they'll do something else, but still some parts are common). So, with these general remarks, some more concrete ones about the patch: > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index eaef4cd63d2..0427fede3d6 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a > single loop. > @item lim-expensive > The minimum cost of an expensive expression in the loop invariant motion. > > +@item max-cond-loop-split-insns > +The maximum number of insns to be increased due to loop split on > +semi-invariant condition statement. "to be increased" --> "to be generated" (or "added") > +@item min-cond-loop-split-prob > +The minimum threshold for probability of semi-invaraint condition > +statement to trigger loop split. typo, semi-invariant I think somewhere in the docs your definition of semi-invariant needs to be stated in some form (can be short, doesn't need to reproduce the diagram or such), so don't just replicate the short info from the params.def file. > diff --git a/gcc/params.def b/gcc/params.def > index 0db60951413..5384f7d1c4d 100644 > --- a/gcc/params.def > +++ b/gcc/params.def > @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL, > "The maximum number of unswitchings in a single loop.", > 3, 0, 0) > > +/* The maximum number of increased insns due to loop split on semi-invariant > + condition statement. */ > +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS, > + "max-cond-loop-split-insns", > + "The maximum number of insns to be increased due to loop split on " > + "semi-invariant condition statement.", > + 100, 0, 0) > + > +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB, > + "min-cond-loop-split-prob", > + "The minimum threshold for probability of semi-invaraint condition " > + "statement to trigger loop split.", Same typo: "semi-invariant". > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c > index 999c9a30366..7239d0cfb00 100644 > --- a/gcc/tree-ssa-loop-split.c > +++ b/gcc/tree-ssa-loop-split.c > -/* This file implements loop splitting, i.e. transformation of loops like > +/* This file implements two kind of loop splitting. kind_s_, plural > + One transformation of loops like: > > for (i = 0; i < 100; i++) > { > @@ -670,6 +674,782 @@ tree_ssa_split_loops (void) >return 0; > } > > + > +/* Another transformation of loops like: > + > + for (i = INIT (); CHECK (i); i = NEXT ()) > + { > + if (expr (a_1, a_2, ..., a_n)) > + a_j = ...; // change at least one a_j > + else > + S; // not change any a_j > + } You should mention that 'expr' needs to be pure, i.e. once it becomes false and the inputs don't change, that it remains false. > + > + into: > + > + for (i = INIT (); CHECK (i); i = NEXT ()) > + { > + if (expr (a_1, a_2, ..., a_n)) > + a_j = ...; > + else > + { > + S; > + i = NEXT (); > + break; > + } > + } > + > + for (; CHECK (i); i = NEXT ()) > + { > + S; > + } > + > + */ > + ... > +/* Determine when conditional statement never transfers execution to one of > its > + branch, whether we can remove the branch's leading basic block (BRANCH_BB) > + and those basic blocks dominated by BRANCH_BB. */ > + > +static bool > +branch_removable_p (basic_block branch_bb) > +{ > + if (single_pred_p (branch_bb)) > +return true; > + > + edge e; > + edge_iterator ei; > + > + FOR_EACH_EDGE (e, ei, branch_bb->preds) > +{ > + if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb)) > + continue; > + > + if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src)) > + continue; My gut feeling is surp