Re: PR rtl-optimization/64081: Enable RTL loop unrolling for duplicated exit blocks and back edges (with AIX fixes)

2017-02-22 Thread Jeff Law

On 02/15/2017 03:49 AM, Aldy Hernandez wrote:

On 02/13/2017 07:15 PM, Jeff Law wrote:


So it seems in your updated patch there is only one call where we ask
for LOOP_EXIT_COMPLEX, specifically the call from get_loop_location.

But I don't see how asking for LOOP_EXIT_COMPLEX from that location
would change whether or not we unroll any given loop (which is the core
of bz64081).

Am I missing something?


Ughh, only the spaghetti that is this code? ;-).

get_loop_location is only called once in the compiler, in
decide_unrolling().  This call to get_loop_location() will set the loop
description, particularly desc->simple_p where you point out.
The "desc" persists, hanging off the loop structure and its the 
existence (or lack thereof) which enables/disables unrolling.


ie, we ask for complex exits via the get_loop_location calls.  So loops 
with complex exits will get a DESC and thus will get unrolled.  Other 
unpleasant things may happen as a result of having a cached DESC as well.


If we're going to go this direction, I think we need to store the exit 
type in the DESC, then verify that if we ask for simple exits, then only 
give back a DESC for loops with simple exits.  If we ask for complex 
exits, then we can give back any cached DESC.


Then every instance where we ask for a desc has to be checked for 
whether or not it is safe in the presence of complex exits and ask for 
the appropriate style.






Later on down in decide_unrolling(), we decide the number of iterations,
and use desc->simple_p to ignore the loop if it is not simple.

  decide_unroll_constant_iterations (loop, flags);
  if (loop->lpt_decision.decision == LPT_NONE)
decide_unroll_runtime_iterations (loop, flags);
  if (loop->lpt_decision.decision == LPT_NONE)
decide_unroll_stupid (loop, flags);

Any one of these functions will bail if the loop description was not
simple_p:
I don't think so.  If that were the case, then we'd never unroll the 
loop in the testcase.  In fact, decide_unroll_constant_iterations 
unrolls the loop with the complex exit.






Now a problem I see here is that decide_unroll_*_iterations all call
get_simple_loop_desc() which is basically LOOP_EXIT_SIMPLE, but since
the value is already cached we return the previous call that was
LOOP_EXIT_COMPLEX.  So the code works because we will already have a
cached value.
Right.  And ISTM it's the existence of the DESC structure that's the key 
here.  Right?




I think to make it clearer we could:

1. Add an assert in get_loop_desc to make sure that if we're returning a
cached loop description, that the LOOP_EXIT_TYPEs match.  Just in case...
Probably not a bad idea.  But you don't save the loop exit type AFAICT. 
In fact, if you did, you'd quickly see that the calls via decide_* would 
likely trigger your assert.






2. Change all the decide_unroll_*_iterations variants to specifically
ask for a LOOP_EXIT_TYPE, not just  the simple variant. And have this
set to LOOP_EXIT_COMPLEX from decide_unrolling.  Right now, this is all
working because we have only one call to get_loop_location, but I assume
that could change.
But the simple variants all ask for LOOP_EXIT_SIMPLE, so I don't see 
this as improving things significantly.




3. And finally, what the heck is get_loop_location doing in cfgloop,
when it's only used once within loop-unroll.c?  I say we move it to
loop-unroll.c and mark it static.

Seems like a reasonable cleanup.




Does this help?
It does.  But it really shows that the introduction of simple vs complex 
exits is messed up.  All we've done is add another layer of complexity 
that happens to work IMHO, but it seems to do so more by accident than 
design.


Jeff


Re: PR rtl-optimization/64081: Enable RTL loop unrolling for duplicated exit blocks and back edges (with AIX fixes)

2017-02-15 Thread Aldy Hernandez

On 02/13/2017 07:15 PM, Jeff Law wrote:


So it seems in your updated patch there is only one call where we ask
for LOOP_EXIT_COMPLEX, specifically the call from get_loop_location.

But I don't see how asking for LOOP_EXIT_COMPLEX from that location
would change whether or not we unroll any given loop (which is the core
of bz64081).

Am I missing something?


Ughh, only the spaghetti that is this code? ;-).

get_loop_location is only called once in the compiler, in 
decide_unrolling().  This call to get_loop_location() will set the loop 
description, particularly desc->simple_p where you point out.


Later on down in decide_unrolling(), we decide the number of iterations, 
and use desc->simple_p to ignore the loop if it is not simple.


  decide_unroll_constant_iterations (loop, flags);
  if (loop->lpt_decision.decision == LPT_NONE)
decide_unroll_runtime_iterations (loop, flags);
  if (loop->lpt_decision.decision == LPT_NONE)
decide_unroll_stupid (loop, flags);

Any one of these functions will bail if the loop description was not 
simple_p:


  /* Check for simple loops.  */
  desc = get_simple_loop_desc (loop);

  /* Check simpleness.  */
  if (!desc->simple_p || desc->assumptions)
{
  if (dump_file)
fprintf (dump_file,
 ";; Unable to prove that the number of iterations "
 "can be counted in runtime\n");
  return;
}

(Yes, there's a lot of duplicated code in decide_unroll_*_iterations.)

Now a problem I see here is that decide_unroll_*_iterations all call 
get_simple_loop_desc() which is basically LOOP_EXIT_SIMPLE, but since 
the value is already cached we return the previous call that was 
LOOP_EXIT_COMPLEX.  So the code works because we will already have a 
cached value.


I think to make it clearer we could:

1. Add an assert in get_loop_desc to make sure that if we're returning a 
cached loop description, that the LOOP_EXIT_TYPEs match.  Just in case...


2. Change all the decide_unroll_*_iterations variants to specifically 
ask for a LOOP_EXIT_TYPE, not just  the simple variant. And have this 
set to LOOP_EXIT_COMPLEX from decide_unrolling.  Right now, this is all 
working because we have only one call to get_loop_location, but I assume 
that could change.


3. And finally, what the heck is get_loop_location doing in cfgloop, 
when it's only used once within loop-unroll.c?  I say we move it to 
loop-unroll.c and mark it static.


Does this help?

Aldy


Re: PR rtl-optimization/64081: Enable RTL loop unrolling for duplicated exit blocks and back edges (with AIX fixes)

2017-02-13 Thread Jeff Law

On 02/09/2017 11:08 AM, Aldy Hernandez wrote:

For those of you not following the PR, this is a re-post of a patch that
was approved in some form a year+ ago, but was reverted because it
caused an undiagnosed bootstrap problem on AIX:

https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00421.html

Annoyingly, the AIX problem disappeared with some unrelated changes to
SCCVN by Richard in more recent GCC's.  Consequently, the patch got
blocked because we could no longer reproduce the problem, but it wasn't
entirely clear that it was gone.

As penance for sins in a previous life, I have taken it upon myself to
reproduce this problem back in time and find what caused the AIX failure
back then.  I will spare the list the boring process, but the problem
with the original patch was that it changed the semantics of
check_simple_exit, but not all of it's indirect callers.  This caused
two versions of the same loop to have different unsynchronized iteration
variables-- one in a simple register, and one in a doloop variant on ppc.

I have bootstrapped this patch around trunk@226811 on AIX (all
languages), which is the latest time AIX failed to bootstrap with the
original patch.  My testing environment included a handful of other
unrelated changes that were required to coerce AIX into building
GCC5-ish with GCC6.1.

I have also bootstrapped and tested this patch at today's trunk on
x86-64 Linux with no adverse effects.

As I've mentioned, I believe the original patch was previously approved,
so to aid in reviewing I am including the full patch ("full-patch") with
my changes on top of the original patch, as well as an incremental patch
("incremental-patch") with my recent changes to bring AIX to happiness.

I understand if changes to the RTL looping infrastructure are too late
for this release cycle, and as I am not the original author-- don't kill
the messenger!  I just want to move this forward, even if it means
getting approval for GCC8.  This will at the very least allow me to
sleep at night, knowing I won't have to touch AIX for a while (or at
least wrt to this PR) ;-).

Original author(s) CC'ed.
So it seems in your updated patch there is only one call where we ask 
for LOOP_EXIT_COMPLEX, specifically the call from get_loop_location.


But I don't see how asking for LOOP_EXIT_COMPLEX from that location 
would change whether or not we unroll any given loop (which is the core 
of bz64081).


Am I missing something?

Jeff


PR rtl-optimization/64081: Enable RTL loop unrolling for duplicated exit blocks and back edges (with AIX fixes)

2017-02-09 Thread Aldy Hernandez
For those of you not following the PR, this is a re-post of a patch that 
was approved in some form a year+ ago, but was reverted because it 
caused an undiagnosed bootstrap problem on AIX:


https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00421.html

Annoyingly, the AIX problem disappeared with some unrelated changes to 
SCCVN by Richard in more recent GCC's.  Consequently, the patch got 
blocked because we could no longer reproduce the problem, but it wasn't 
entirely clear that it was gone.


As penance for sins in a previous life, I have taken it upon myself to 
reproduce this problem back in time and find what caused the AIX failure 
back then.  I will spare the list the boring process, but the problem 
with the original patch was that it changed the semantics of 
check_simple_exit, but not all of it's indirect callers.  This caused 
two versions of the same loop to have different unsynchronized iteration 
variables-- one in a simple register, and one in a doloop variant on ppc.


I have bootstrapped this patch around trunk@226811 on AIX (all 
languages), which is the latest time AIX failed to bootstrap with the 
original patch.  My testing environment included a handful of other 
unrelated changes that were required to coerce AIX into building 
GCC5-ish with GCC6.1.


I have also bootstrapped and tested this patch at today's trunk on 
x86-64 Linux with no adverse effects.


As I've mentioned, I believe the original patch was previously approved, 
so to aid in reviewing I am including the full patch ("full-patch") with 
my changes on top of the original patch, as well as an incremental patch 
("incremental-patch") with my recent changes to bring AIX to happiness.


I understand if changes to the RTL looping infrastructure are too late 
for this release cycle, and as I am not the original author-- don't kill 
the messenger!  I just want to move this forward, even if it means 
getting approval for GCC8.  This will at the very least allow me to 
sleep at night, knowing I won't have to touch AIX for a while (or at 
least wrt to this PR) ;-).


Original author(s) CC'ed.

Bring it!
Aldy
PR rtl-optimization/64081
* loop-iv.c (def_pred_latch_p): New function.
(latch_dominating_def): Allow specific cases with non-single
definitions.
(iv_get_reaching_def): Likewise.
(check_complex_exit_p): New function.
(check_simple_exit): Use check_complex_exit_p to allow certain cases
with exits not executing on any iteration.
Rename to check_loop_exit.
(find_simple_exit): Rename to find_loop_exit.
(get_simple_loop_desc): Rename to get_loop_desc.
* cfgloop.c (get_loop_location): Use get_loop_desc instead of
get_simple_loop_desc.
* cfgloop.h (enum loop_exit_type): New.
(find_loop_exit): New prototype.
(find_simple_exit): Define as inline to use find_loop_exit.
(get_loop_desc): New prototype.
(get_simple_loop_desc): Define as inline to use get_loop_desc.

diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index afd56bb..df9d394 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -1742,9 +1742,9 @@ get_loop_location (struct loop *loop)
  of the for or while statement, if possible.  To do this, look
  for the branch guarding the loop back-edge.  */
 
-  /* If this is a simple loop with an in_edge, then the loop control
- branch is typically at the end of its source.  */
-  desc = get_simple_loop_desc (loop);
+  /* If this is a loop with an in_edge, then the loop control branch
+ is typically at the end of its source.  */
+  desc = get_loop_desc (loop, LOOP_EXIT_COMPLEX);
   if (desc->in_edge)
 {
   FOR_BB_INSNS_REVERSE (desc->in_edge->src, insn)
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 80a1915..583d247 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -460,6 +460,16 @@ struct GTY(()) niter_desc
   rtx niter_expr;
 };
 
+enum loop_exit_type
+{
+  /* Simple exit out of a loop.  */
+  LOOP_EXIT_SIMPLE,
+
+  /* Simple, but slightly more complex loop exit sometimes allowed for
+ further loop analysis.  See check_complex_exit_p.  */
+  LOOP_EXIT_COMPLEX
+};
+
 extern void iv_analysis_loop_init (struct loop *);
 extern bool iv_analyze (rtx_insn *, rtx, struct rtx_iv *);
 extern bool iv_analyze_result (rtx_insn *, rtx, struct rtx_iv *);
@@ -467,12 +477,30 @@ extern bool iv_analyze_expr (rtx_insn *, rtx, 
machine_mode,
 struct rtx_iv *);
 extern rtx get_iv_value (struct rtx_iv *, rtx);
 extern bool biv_p (rtx_insn *, rtx);
-extern void find_simple_exit (struct loop *, struct niter_desc *);
+extern void find_loop_exit (struct loop *, struct niter_desc *,
+   enum loop_exit_type);
 extern void iv_analysis_done (void);
 
-extern struct niter_desc *get_simple_loop_desc (struct loop *loop);
+extern struct niter_desc *get_loop_desc (struct loop *loop,
+enum loop_exit_type);
 extern void