Re: [PATCH][GRAPHITE] Simplify SCOP detection

2017-09-28 Thread Sebastian Pop
On Wed, Sep 27, 2017 at 2:21 AM, Richard Biener  wrote:
> On Tue, 26 Sep 2017, Sebastian Pop wrote:
>
>> On Tue, Sep 26, 2017 at 7:03 AM, Richard Biener  wrote:
>>
>> >
>> > The following is the result of me trying to understand SCOP detection
>> > and the validity checks spread around the machinery.  It removes several
>> > quadraticnesses by folding validity checks into
>> > scop_detection::harmful_loop_in_region where we already walk over all
>> > BBs in the region and process individual found loops.
>> >
>> > It also rewrites build_scop_depth/build_scop_breadth into something
>> > I can undestand.
>> >
>> > Bootstrap and regtest is running on x86_64-unknown-linux-gnu (graphite.exp
>> > for all langs is happy, so is SPEC CPU 2006 testing where the statistics
>> > agree before/after the patch).
>> >
>> > I'll apply this after the bootstrap finished.
>> >
>>
>> Have you tried to bootstrap with BOOT_CFLAGS="-O2 -fgraphite-identity"?
>
> I do "-O2 -g -floop-nest-optimize"

Very good.

> but I guess -fgraphite-identity
> should catch more issues?

It would systematically exercise the scop detection and code generation.
When isl scheduler does not find a better schedule, we would not bother
running the code gen part.

> Hmm, maybe -floop-nest-optimize and
> -fgraphite-identity should be combinable
>
> Index: gcc/graphite-optimize-isl.c
> ===
> --- gcc/graphite-optimize-isl.c (revision 253203)
> +++ gcc/graphite-optimize-isl.c (working copy)
> @@ -189,7 +189,7 @@ optimize_isl (scop_p scop)
> print_schedule_ast (dump_file, scop->original_schedule, scop);
>isl_schedule_free (scop->transformed_schedule);
>scop->transformed_schedule = isl_schedule_copy
> (scop->original_schedule);
> -  return false;
> +  return flag_graphite_identity || flag_loop_parallelize_all;

Yes.

>  }
>
>return true;
>
> I'll test/commit the above.

ok.

>
>>
>> > Richard.
>> >
>> > 2017-09-26  Richard Biener  
>> >
>> > * graphite-scop-detection.c (scop_detection::build_scop_depth):
>> > Rewrite,
>> > fold in ...
>> > (scop_detection::build_scop_breadth): ... this.  Removed.
>> > (scop_detection::loop_is_valid_in_scop): Fold into single caller.
>> > (scop_detection::harmful_stmt_in_bb): Likewise.
>> > (scop_detection::graphite_can_represent_stmt): Likewise.
>> > (scop_detection::loop_body_is_valid_scop): Likewise.  Remove
>> > recursion.
>> > (scop_detection::can_represent_loop): Remove recursion, fold in
>> > ...
>> > (scop_detection::can_represent_loop_1): ... this.  Removed.
>> > (scop_detection::harmful_loop_in_region): Simplify after inlining
>> > the above and remove more quadraticness.
>> > (build_scops): Adjust.
>> > * tree-data-ref.c (loop_nest_has_data_refs): Remove pointless
>> > quadraticness.
>> >
>> >
>> This goes in the right direction: it cuts down compilation time.
>> As it is not a trivial change, I need some time to understand how
>> the scop detection works with this change.
>
> The only functional change should be that the SESE composition now
> works top-down instead of working its way bottom-up.  It's not clear
> whether we do more or less work that way

So we went from top-down to bottom-up,
and now with this change we go back to top-down.
I think both algorithms are equivalent in terms of number
of times we validate statements.

We explained the current implementation of the scop detection in:
http://impact.gforge.inria.fr/impact2016/papers/impact2016-kumar.pdf
http://impact.gforge.inria.fr/impact2016/papers/impact2016-kumar-slides.pdf

Here is what happens on an example:

loop_1 {
  loop_2 {
stmt_1
  }
  stmt_2
  loop_3 {
stmt_3
  }
}

- with a top down scop detection, we would start the analysis with loop_1,
and start validating that every stmt in its body (stmt_1, stmt_2,
and finally stmt_3) can be represented in the polyhedral representation.
If at any moment the analysis returns "cannot represent", it would go one
level down and try to validate the immediate sub loop loop_2.
Let's assume that stmt_1 can be represented, and so it would try to
extend the scop by validating stmt_2 and then its sibling loop_3, and say
we fail on validating stmt_3.  All done, max scop is stmt_1 in loop_2
followed by stmt_2.

- with a bottom up we would start from the inner loop_2, it passes
validation of stmt_1, then we extend the scop by validating stmt_2,
and then we fail at validation of stmt_3.  All done, max scop is stmt_1 in
loop_2 followed by stmt_2.  In the bottom-up process we don't
have to validate the outer loop_1.

Supposing that there is no fail in the process, then a top-down detection
would be faster as it does not need to validate one by one the inner loops:
it just goes in one pass over the stmts of loop_1 body.

> I think we can 

Re: [PATCH][GRAPHITE] Simplify SCOP detection

2017-09-27 Thread Richard Biener
On Tue, 26 Sep 2017, Sebastian Pop wrote:

> On Tue, Sep 26, 2017 at 7:03 AM, Richard Biener  wrote:
> 
> >
> > The following is the result of me trying to understand SCOP detection
> > and the validity checks spread around the machinery.  It removes several
> > quadraticnesses by folding validity checks into
> > scop_detection::harmful_loop_in_region where we already walk over all
> > BBs in the region and process individual found loops.
> >
> > It also rewrites build_scop_depth/build_scop_breadth into something
> > I can undestand.
> >
> > Bootstrap and regtest is running on x86_64-unknown-linux-gnu (graphite.exp
> > for all langs is happy, so is SPEC CPU 2006 testing where the statistics
> > agree before/after the patch).
> >
> > I'll apply this after the bootstrap finished.
> >
> 
> Have you tried to bootstrap with BOOT_CFLAGS="-O2 -fgraphite-identity"?

I do "-O2 -g -floop-nest-optimize" but I guess -fgraphite-identity
should catch more issues?  Hmm, maybe -floop-nest-optimize and
-fgraphite-identity should be combinable

Index: gcc/graphite-optimize-isl.c
===
--- gcc/graphite-optimize-isl.c (revision 253203)
+++ gcc/graphite-optimize-isl.c (working copy)
@@ -189,7 +189,7 @@ optimize_isl (scop_p scop)
print_schedule_ast (dump_file, scop->original_schedule, scop);
   isl_schedule_free (scop->transformed_schedule);
   scop->transformed_schedule = isl_schedule_copy 
(scop->original_schedule);
-  return false;
+  return flag_graphite_identity || flag_loop_parallelize_all;
 }
 
   return true;

I'll test/commit the above.

> 
> > Richard.
> >
> > 2017-09-26  Richard Biener  
> >
> > * graphite-scop-detection.c (scop_detection::build_scop_depth):
> > Rewrite,
> > fold in ...
> > (scop_detection::build_scop_breadth): ... this.  Removed.
> > (scop_detection::loop_is_valid_in_scop): Fold into single caller.
> > (scop_detection::harmful_stmt_in_bb): Likewise.
> > (scop_detection::graphite_can_represent_stmt): Likewise.
> > (scop_detection::loop_body_is_valid_scop): Likewise.  Remove
> > recursion.
> > (scop_detection::can_represent_loop): Remove recursion, fold in
> > ...
> > (scop_detection::can_represent_loop_1): ... this.  Removed.
> > (scop_detection::harmful_loop_in_region): Simplify after inlining
> > the above and remove more quadraticness.
> > (build_scops): Adjust.
> > * tree-data-ref.c (loop_nest_has_data_refs): Remove pointless
> > quadraticness.
> >
> >
> This goes in the right direction: it cuts down compilation time.
> As it is not a trivial change, I need some time to understand how
> the scop detection works with this change.

The only functional change should be that the SESE composition now
works top-down instead of working its way bottom-up.  It's not clear
whether we do more or less work that way but at least the function
is now readable -- I think we can structure it bottom-up as well
without doing the confusing two-function way (which I believe
did quite some duplicate work but I never was sure...).

Richard.


Re: [PATCH][GRAPHITE] Simplify SCOP detection

2017-09-26 Thread Sebastian Pop
On Tue, Sep 26, 2017 at 7:03 AM, Richard Biener  wrote:

>
> The following is the result of me trying to understand SCOP detection
> and the validity checks spread around the machinery.  It removes several
> quadraticnesses by folding validity checks into
> scop_detection::harmful_loop_in_region where we already walk over all
> BBs in the region and process individual found loops.
>
> It also rewrites build_scop_depth/build_scop_breadth into something
> I can undestand.
>
> Bootstrap and regtest is running on x86_64-unknown-linux-gnu (graphite.exp
> for all langs is happy, so is SPEC CPU 2006 testing where the statistics
> agree before/after the patch).
>
> I'll apply this after the bootstrap finished.
>

Have you tried to bootstrap with BOOT_CFLAGS="-O2 -fgraphite-identity"?


> Richard.
>
> 2017-09-26  Richard Biener  
>
> * graphite-scop-detection.c (scop_detection::build_scop_depth):
> Rewrite,
> fold in ...
> (scop_detection::build_scop_breadth): ... this.  Removed.
> (scop_detection::loop_is_valid_in_scop): Fold into single caller.
> (scop_detection::harmful_stmt_in_bb): Likewise.
> (scop_detection::graphite_can_represent_stmt): Likewise.
> (scop_detection::loop_body_is_valid_scop): Likewise.  Remove
> recursion.
> (scop_detection::can_represent_loop): Remove recursion, fold in
> ...
> (scop_detection::can_represent_loop_1): ... this.  Removed.
> (scop_detection::harmful_loop_in_region): Simplify after inlining
> the above and remove more quadraticness.
> (build_scops): Adjust.
> * tree-data-ref.c (loop_nest_has_data_refs): Remove pointless
> quadraticness.
>
>
This goes in the right direction: it cuts down compilation time.
As it is not a trivial change, I need some time to understand how
the scop detection works with this change.

Sebastian


[PATCH][GRAPHITE] Simplify SCOP detection

2017-09-26 Thread Richard Biener

The following is the result of me trying to understand SCOP detection
and the validity checks spread around the machinery.  It removes several
quadraticnesses by folding validity checks into 
scop_detection::harmful_loop_in_region where we already walk over all
BBs in the region and process individual found loops.

It also rewrites build_scop_depth/build_scop_breadth into something
I can undestand.

Bootstrap and regtest is running on x86_64-unknown-linux-gnu (graphite.exp
for all langs is happy, so is SPEC CPU 2006 testing where the statistics
agree before/after the patch).

I'll apply this after the bootstrap finished.

Richard.

2017-09-26  Richard Biener  

* graphite-scop-detection.c (scop_detection::build_scop_depth): Rewrite,
fold in ...
(scop_detection::build_scop_breadth): ... this.  Removed.
(scop_detection::loop_is_valid_in_scop): Fold into single caller.
(scop_detection::harmful_stmt_in_bb): Likewise.
(scop_detection::graphite_can_represent_stmt): Likewise.
(scop_detection::loop_body_is_valid_scop): Likewise.  Remove recursion.
(scop_detection::can_represent_loop): Remove recursion, fold in ...
(scop_detection::can_represent_loop_1): ... this.  Removed.
(scop_detection::harmful_loop_in_region): Simplify after inlining
the above and remove more quadraticness.
(build_scops): Adjust.
* tree-data-ref.c (loop_nest_has_data_refs): Remove pointless
quadraticness.


Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 253199)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -362,17 +362,7 @@ public:
 
   /* Build scop outer->inner if possible.  */
 
-  sese_l build_scop_depth (sese_l s, loop_p loop);
-
-  /* If loop and loop->next are valid scops, try to merge them.  */
-
-  sese_l build_scop_breadth (sese_l s1, loop_p loop);
-
-  /* Return true when LOOP is a valid scop, that is a Static Control Part, a
- region of code that can be represented in the polyhedral model.  SCOP
- defines the region we analyse.  */
-
-  bool loop_is_valid_in_scop (loop_p loop, sese_l scop) const;
+  void build_scop_depth (loop_p loop);
 
   /* Return true when BEGIN is the preheader edge of a loop with a single exit
  END.  */
@@ -398,18 +388,6 @@ public:
 
   void remove_intersecting_scops (sese_l s1);
 
-  /* Return true when the body of LOOP has statements that can be represented
- as a valid scop.  */
-
-  bool loop_body_is_valid_scop (loop_p loop, sese_l scop) const;
-
-  /* Return true when BB contains a harmful operation for a scop: that
- can be a function call with side effects, the induction variables
- are not linear with respect to SCOP, etc.  The current open
- scop should end before this statement.  */
-
-  bool harmful_stmt_in_bb (sese_l scop, basic_block bb) const;
-
   /* Return true when a statement in SCOP cannot be represented by Graphite.
  The assumptions are that L1 dominates L2, and SCOP->entry dominates L1.
  Limit the number of bbs between adjacent loops to
@@ -467,19 +445,12 @@ public:
  FIXME: For the moment, graphite cannot be used on loops that iterate using
  induction variables that wrap.  */
 
-  static bool can_represent_loop_1 (loop_p loop, sese_l scop);
-
-  /* Return true when all the loops within LOOP can be represented by
- Graphite.  */
-
   static bool can_represent_loop (loop_p loop, sese_l scop);
 
   /* Returns the number of pbbs that are in loops contained in SCOP.  */
 
   static int nb_pbbs_in_loops (scop_p scop);
 
-  static bool graphite_can_represent_stmt (sese_l, gimple *, basic_block);
-
 private:
   vec scops;
 };
@@ -673,10 +644,6 @@ scop_detection::merge_sese (sese_l first
   return invalid_sese;
 }
 
-  /* Analyze all the BBs in new sese.  */
-  if (harmful_loop_in_region (combined))
-return invalid_sese;
-
   DEBUG_PRINT (dp << "[merged-sese] s1: "; print_sese (dump_file, combined));
 
   return combined;
@@ -684,71 +651,40 @@ scop_detection::merge_sese (sese_l first
 
 /* Build scop outer->inner if possible.  */
 
-sese_l
-scop_detection::build_scop_depth (sese_l s, loop_p loop)
-{
-  if (!loop)
-return s;
-
-  DEBUG_PRINT (dp << "[Depth loop_" << loop->num << "]\n");
-  s = build_scop_depth (s, loop->inner);
-
-  sese_l s2 = merge_sese (s, get_sese (loop));
-  if (!s2)
-{
-  /* s might be a valid scop, so return it and start analyzing from the
-adjacent loop.  */
-  build_scop_depth (invalid_sese, loop->next);
-  return s;
-}
-
-  if (!loop_is_valid_in_scop (loop, s2))
-return build_scop_depth (invalid_sese, loop->next);
-
-  return build_scop_breadth (s2, loop);
-}
-
-/* If loop and loop->next are valid scops, try to merge them.  */
-
-sese_l
-scop_detection::build_scop_breadth (sese_l s1, loop_p loop)
+void