Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-09-12 Thread Feng Xue OS
>> Suppose a loop as:
>>
>> void f (std::map m)
>> {
>> for (auto it = m.begin (); it != m.end (); ++it) {
>> /* if (b) is semi-invariant. */
>> if (b) {
>> b = do_something();/* Has effect on b */
>> } else {
>> /* No effect on b */
>> }
>> statements;  /* Also no effect on b */
>> }
>> }
>>
>> A transformation, kind of loop split, could be:
>>
>> void f (std::map m)
>> {
>> for (auto it = m.begin (); it != m.end (); ++it) {
>> if (b) {
>> b = do_something();
>> } else {
>> ++it;
>> statements;
>> break;
>> }
>> statements;
>> }
>>
>> for (; it != m.end (); ++it) {
>> statements;
>> }
>> }

> So if you consider approaching this from unswitching instead we'd
> unswitch it on if (b) but
> treat the condition as constant only in the 'false' path, thus the
> transformed code would
> look like the following.  I believe implementing this in the existing
> unswitching pass
> involves a lot less code than putting it into the splitting pass but
> it would catch
> exactly the same cases?

May not.

Firstly, the following transformation is legal only when "b" is
semi-invariant, which means once a branch of "if (b)" is selected since
certain iteration, execution will always go to that branch in all following
iterations. Most of code in this loop-split patch was composed to check 
semi-invariantness of a conditional expression, which must also be needed
in loop-unswitch solution.

Secondly, to duplicate/move an invariant expression out of loop is simple,
for all intermediate computations should already be outside, only need to 
handle its result SSA that is inside the loop. But for semi-invariant, we
have to duplicate a tree of statements that contributes to computation of
the condition expression, similar to code hoisting, which make us write 
extra code. 

And loop-unswitch solution is weaker than loop-split. Suppose initial
value of "b" is true, and is changed to false after some iterations, 
not only unswitch does not help that, but also it introduces extra cost
due to enclosing "if (b)".

>   if (b)
>{
> for (auto it = m.begin (); it != m.end (); ++it) {
>  /* if (b) is non-invariant. */
> if (b) {
> b = do_something();/* Has effect on b */
>  } else {
> /* No effect on b */
> }
> statements;  /* Also no effect on b */
> }
> }
>   else
> {
>   for (auto it = m.begin (); it != m.end (); ++it) {
>  /* if (b) is invariant. */
>  if (false) {
>  b = do_something();/* Has effect on b */
>  } else {
>  /* No effect on b */
>  }
>  statements;  /* Also no effect on b */
>  }
> }


Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-09-12 Thread Richard Biener
On Mon, Jul 15, 2019 at 4:20 AM Feng Xue OS  wrote:
>
> Some time passed, so ping again. I made this patch, because it can reward us 
> with 7%
>
> performance benefit in some real application. For convenience, the 
> optimization to be
>
> implemented was listed in the following again. And hope your comments on the 
> patch, or
>
> design suggestions. Thanks!

Replying again to the very first post since it contains the figure below.

> Suppose a loop as:
>
> void f (std::map m)
> {
> for (auto it = m.begin (); it != m.end (); ++it) {
> /* if (b) is semi-invariant. */
> if (b) {
> b = do_something();/* Has effect on b */
> } else {
> /* No effect on b */
> }
> statements;  /* Also no effect on b */
> }
> }
>
> A transformation, kind of loop split, could be:
>
> void f (std::map m)
> {
> for (auto it = m.begin (); it != m.end (); ++it) {
> if (b) {
> b = do_something();
> } else {
> ++it;
> statements;
> break;
> }
> statements;
> }
>
> for (; it != m.end (); ++it) {
> statements;
> }
> }

So if you consider approaching this from unswitching instead we'd
unswitch it on if (b) but
treat the condition as constant only in the 'false' path, thus the
transformed code would
look like the following.  I believe implementing this in the existing
unswitching pass
involves a lot less code than putting it into the splitting pass but
it would catch
exactly the same cases?

  if (b)
   {
for (auto it = m.begin (); it != m.end (); ++it) {
 /* if (b) is non-invariant. */
if (b) {
b = do_something();/* Has effect on b */
 } else {
/* No effect on b */
}
statements;  /* Also no effect on b */
}
}
  else
{
  for (auto it = m.begin (); it != m.end (); ++it) {
 /* if (b) is invariant. */
 if (false) {
 b = do_something();/* Has effect on b */
 } else {
 /* No effect on b */
 }
 statements;  /* Also no effect on b */
 }
}


> If "statements" contains nothing, the second loop becomes an empty one, which 
> can be removed.
> And if "statements" are straight line instructions, we get an opportunity to 
> vectorize the second loop.
>
> Feng
>
>
> 
> From: Feng Xue OS
> Sent: Tuesday, June 18, 2019 3:00 PM
> To: Richard Biener; Michael Matz
> Cc: gcc-patches@gcc.gnu.org
> Subject: Ping: [PATCH V2] Loop split upon semi-invariant condition (PR 
> tree-optimization/89134)
>
> Richard & Michael,
>
>I made some adjustments on coding style and added test cases for this 
> version.
>
>Would you please take a look at the patch? It is long a little bit and 
> might steal some
>of your time.
>
> Thanks a lot.
>
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 9a46f93d89d..2334b184945 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,23 @@
> +2019-06-18  Feng Xue 
> +
> +   PR tree-optimization/89134
> +   * doc/invoke.texi (max-cond-loop-split-insns): Document new --params.
> +   (min-cond-loop-split-prob): Likewise.
> +   * params.def: Add max-cond-loop-split-insns, min-cond-loop-split-prob.
> +   * passes.def (pass_cond_loop_split) : New pass.
> +   * timevar.def (TV_COND_LOOP_SPLIT): New time variable.
> +   * tree-pass.h (make_pass_cond_loop_split): New declaration.
> +   * tree-ssa-loop-split.c (split_info): New class.
> +   (find_vdef_in_loop, vuse_semi_invariant_p): New functions.
> +   (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise.
> +   (branch_removable_p, get_cond_invariant_branch): Likewise.
> +   (is_cond_in_hidden_loop, compute_added_num_insns): Likewise.
> +   (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise.
> +   (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise.
> +   (pass_data_cond_loop_split): New variable.
> +   (pass_cond_loop_split): New class.
> +   (make_pass_cond_loop_split): New function.
> +
>  2019-06-18  Kewen Lin  
>
>  PR middle-end/80791
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index eaef4cd63d2..0427fede3d6 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a 
> single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-cond-loop-split-insns
> +The maximum number of insns

Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-09-12 Thread Feng Xue OS
Hi, Michael,

  Since I was involved in other tasks, it is a little bit late to reply you. 
Sorry
for that. I composed a new one with your suggestions. Please review that
when you are in convenience. 

> Generally I do like the idea of the transformation, and the basic building
> blocks seem to be sound.  But I dislike it being a separate pass, so
> please integrate the code you have written into the existing loop split
> pass.  Most building blocks can be used as is, except the main driver.
This new transformation was integrated into the pass of original loop split.

>> +@item max-cond-loop-split-insns
>> +The maximum number of insns to be increased due to loop split on
>> +semi-invariant condition statement.

> "to be increased" --> "to be generated" (or "added")
Done.

>> +@item min-cond-loop-split-prob
>> +The minimum threshold for probability of semi-invaraint condition
>> +statement to trigger loop split.

> typo, semi-invariant
Done.

> I think somewhere in the docs your definition of semi-invariant needs
> to be stated in some form (can be short, doesn't need to reproduce the
> diagram or such), so don't just replicate the short info from the
> params.def file.
Done.

>> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
>> +   "min-cond-loop-split-prob",
>> +   "The minimum threshold for probability of semi-invaraint condition "
>> +   "statement to trigger loop split.",

> Same typo: "semi-invariant".
Done.

>> -/* This file implements loop splitting, i.e. transformation of loops like
>> +/* This file implements two kind of loop splitting.

> kind_s_, plural
Done.

>> +/* Another transformation of loops like:
>> +
>> +   for (i = INIT (); CHECK (i); i = NEXT ())
>> + {
>> +   if (expr (a_1, a_2, ..., a_n))
>> + a_j = ...;  // change at least one a_j
>> +   else
>> + S;  // not change any a_j
>> + }

> You should mention that 'expr' needs to be pure, i.e. once it
> becomes false and the inputs don't change, that it remains false.
Done.

>> +static bool
>> +branch_removable_p (basic_block branch_bb)
>> +{
>> +  if (single_pred_p (branch_bb))
>> +return true;
>> +
>> +  edge e;
>> +  edge_iterator ei;
>> +
>> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
>> +{
>> +  if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
>> +   continue;
>> +
>> +  if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
>> +   continue;

> My gut feeling is surprised by this.  So one of the predecessors of
> branch_bb dominates it.  Why should that indicate that branch_bb
> can be safely removed?
>
> Think about something like this:
>
>   esrc --> cond_bb --> branch_bb
>   '---^

If all predecessors of branch_bb dominate it, these predecessors should also
be in dominating relationship among them, and the conditional statement must
be branch_bb's immediate dominator, and branch_bb is removable. In your example.

For "esrc", loop is continued, nothing is impacted. But in the next iteration, 
we
encounter "cond_bb", it does not dominate "branch_bb", so the function return
false in the following return statement.

> (cond_bb is the callers bb of the cond statement in question).  Now esrc
> dominates branch_bb but still you can't simply remove it, even if
> the cond_bb->branch_bb edge becomes unexecutable.


>> +static int
>> +get_cond_invariant_branch (struct loop *loop, gcond *cond)

> Please return an edge here, not an edge index (which removes the using of
> -1).  I think the name (and comment) should consistently talk about
> semi-invariant, not invariant.  For when you really need an edge index
> later, you could use "EDGE_SUCC(bb, 0) != edge".  But you probably don't
> really need it, e.g. instead of using the gimple pass-local-flag on a
> statement you can just as well also store the edge in your info structure.
Done.

>> +static bool
>> +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb,
>> +   int branch)

> With above change in get_cond_invariant_branch, this also should
> take an edge, not a bb+edge-index.
Done.

>> +static int
>> +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb,
>> +int branch)

> This should take an edge as well.
Done.

>> +  for (unsigned i = 0; i < loop->num_nodes; i++)
>> +{
>> +  /* Do no count basic blocks only in opposite branch.  */
>> +  if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var))
>> +   continue;
>> +
>> +  for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p 
>> (gsi);
>> +  gsi_next (&gsi))
>> +   num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);

> Replace the loop by
>  estimate_num_insn_seq (bb_seq (bbs[i]), &eni_size_weights);
Done.


>> +  /* Add a threshold for increased code size to disable loop split.  */
>> +  if (compute_added_num_insns (loop, cond_bb, branch) >
>> +  PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS))

> Operator should 

Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-07-31 Thread Feng Xue OS
Thanks for these comments.

Feng


From: Michael Matz 
Sent: Tuesday, July 30, 2019 1:59:04 AM
To: Feng Xue OS
Cc: Richard Biener; gcc-patches@gcc.gnu.org
Subject: Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition 
(PR tree-optimization/89134)

Hello Feng,

first, sorry for the terrible delay in reviewing, but here is one now :)

Generally I do like the idea of the transformation, and the basic building
blocks seem to be sound.  But I dislike it being a separate pass, so
please integrate the code you have written into the existing loop split
pass.  Most building blocks can be used as is, except the main driver.

The existing loop-split code uses loop->aux as binary marker and analyses
and transforms loops in one go, you're doing it separately.  That
separation makes sense for you, so the existing code should be changed to
also do that separately.  Some info for the existing loop-split analysis
needs to be stored in the info struct then as well, which can be done if
you add some fields in yours.  Some splitting-out of code from the
existing main driver is probably needed (basically the parts that
determine eligibility and which cond statement to split).

The two routines that actually split the loops (those that call
loop_version) also have an overlap, so maybe something more can be
commonized between the two (ultimately the way of splitting in both
variants is different, so somewhere they'll do something else, but still
some parts are common).

So, with these general remarks, some more concrete ones about the patch:

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index eaef4cd63d2..0427fede3d6 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a 
> single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-cond-loop-split-insns
> +The maximum number of insns to be increased due to loop split on
> +semi-invariant condition statement.

"to be increased" --> "to be generated" (or "added")

> +@item min-cond-loop-split-prob
> +The minimum threshold for probability of semi-invaraint condition
> +statement to trigger loop split.

typo, semi-invariant
I think somewhere in the docs your definition of semi-invariant needs
to be stated in some form (can be short, doesn't need to reproduce the
diagram or such), so don't just replicate the short info from the
params.def file.

> diff --git a/gcc/params.def b/gcc/params.def
> index 0db60951413..5384f7d1c4d 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>  "The maximum number of unswitchings in a single loop.",
>  3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> +   "max-cond-loop-split-insns",
> +   "The maximum number of insns to be increased due to loop split on "
> +   "semi-invariant condition statement.",
> +   100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> +   "min-cond-loop-split-prob",
> +   "The minimum threshold for probability of semi-invaraint condition "
> +   "statement to trigger loop split.",

Same typo: "semi-invariant".

> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 999c9a30366..7239d0cfb00 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kind of loop splitting.

kind_s_, plural

> +   One transformation of loops like:
>
> for (i = 0; i < 100; i++)
>   {
> @@ -670,6 +674,782 @@ tree_ssa_split_loops (void)
>return 0;
>  }
>
> +
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> + {
> +   if (expr (a_1, a_2, ..., a_n))
> + a_j = ...;  // change at least one a_j
> +   else
> + S;  // not change any a_j
> + }

You should mention that 'expr' needs to be pure, i.e. once it
becomes false and the inputs don't change, that it remains false.

> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> + {
> +   if (expr (a_1, a_2, ..., a_n))
> + a_j = ...;
> +   else
> + {
> +   S;
> +   i = NEXT ();
> +   break;
> + }
> + }
> +
> +   for (; CHECK (i); i = NEXT ())
> + {
> +

Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-07-29 Thread Michael Matz
Hello Feng,

first, sorry for the terrible delay in reviewing, but here is one now :)

Generally I do like the idea of the transformation, and the basic building 
blocks seem to be sound.  But I dislike it being a separate pass, so 
please integrate the code you have written into the existing loop split 
pass.  Most building blocks can be used as is, except the main driver.

The existing loop-split code uses loop->aux as binary marker and analyses 
and transforms loops in one go, you're doing it separately.  That 
separation makes sense for you, so the existing code should be changed to 
also do that separately.  Some info for the existing loop-split analysis 
needs to be stored in the info struct then as well, which can be done if 
you add some fields in yours.  Some splitting-out of code from the 
existing main driver is probably needed (basically the parts that 
determine eligibility and which cond statement to split).

The two routines that actually split the loops (those that call 
loop_version) also have an overlap, so maybe something more can be 
commonized between the two (ultimately the way of splitting in both 
variants is different, so somewhere they'll do something else, but still 
some parts are common).

So, with these general remarks, some more concrete ones about the patch:

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index eaef4cd63d2..0427fede3d6 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a 
> single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
> 
> +@item max-cond-loop-split-insns
> +The maximum number of insns to be increased due to loop split on
> +semi-invariant condition statement.

"to be increased" --> "to be generated" (or "added")

> +@item min-cond-loop-split-prob
> +The minimum threshold for probability of semi-invaraint condition
> +statement to trigger loop split.

typo, semi-invariant
I think somewhere in the docs your definition of semi-invariant needs
to be stated in some form (can be short, doesn't need to reproduce the 
diagram or such), so don't just replicate the short info from the 
params.def file.

> diff --git a/gcc/params.def b/gcc/params.def
> index 0db60951413..5384f7d1c4d 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>  "The maximum number of unswitchings in a single loop.",
>  3, 0, 0)
> 
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> +   "max-cond-loop-split-insns",
> +   "The maximum number of insns to be increased due to loop split on "
> +   "semi-invariant condition statement.",
> +   100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> +   "min-cond-loop-split-prob",
> +   "The minimum threshold for probability of semi-invaraint condition "
> +   "statement to trigger loop split.",

Same typo: "semi-invariant".

> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 999c9a30366..7239d0cfb00 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kind of loop splitting.

kind_s_, plural

> +   One transformation of loops like:
> 
> for (i = 0; i < 100; i++)
>   {
> @@ -670,6 +674,782 @@ tree_ssa_split_loops (void)
>return 0;
>  }
> 
> +
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> + {
> +   if (expr (a_1, a_2, ..., a_n))
> + a_j = ...;  // change at least one a_j
> +   else
> + S;  // not change any a_j
> + }

You should mention that 'expr' needs to be pure, i.e. once it
becomes false and the inputs don't change, that it remains false.

> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> + {
> +   if (expr (a_1, a_2, ..., a_n))
> + a_j = ...;
> +   else
> + {
> +   S;
> +   i = NEXT ();
> +   break;
> + }
> + }
> +
> +   for (; CHECK (i); i = NEXT ())
> + {
> +   S;
> + }
> +
> +   */
> +
...
> +/* Determine when conditional statement never transfers execution to one of 
> its
> +   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
> +   and those basic blocks dominated by BRANCH_BB.  */
> +
> +static bool
> +branch_removable_p (basic_block branch_bb)
> +{
> +  if (single_pred_p (branch_bb))
> +return true;
> +
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, branch_bb->preds)
> +{
> +  if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb))
> +   continue;
> +
> +  if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src))
> +   continue;

My gut feeling is surp