Re: PR rtl-optimization/64081: Enable RTL loop unrolling for duplicated exit blocks and back edges (with AIX fixes)
On 02/15/2017 03:49 AM, Aldy Hernandez wrote: On 02/13/2017 07:15 PM, Jeff Law wrote: So it seems in your updated patch there is only one call where we ask for LOOP_EXIT_COMPLEX, specifically the call from get_loop_location. But I don't see how asking for LOOP_EXIT_COMPLEX from that location would change whether or not we unroll any given loop (which is the core of bz64081). Am I missing something? Ughh, only the spaghetti that is this code? ;-). get_loop_location is only called once in the compiler, in decide_unrolling(). This call to get_loop_location() will set the loop description, particularly desc->simple_p where you point out. The "desc" persists, hanging off the loop structure and its the existence (or lack thereof) which enables/disables unrolling. ie, we ask for complex exits via the get_loop_location calls. So loops with complex exits will get a DESC and thus will get unrolled. Other unpleasant things may happen as a result of having a cached DESC as well. If we're going to go this direction, I think we need to store the exit type in the DESC, then verify that if we ask for simple exits, then only give back a DESC for loops with simple exits. If we ask for complex exits, then we can give back any cached DESC. Then every instance where we ask for a desc has to be checked for whether or not it is safe in the presence of complex exits and ask for the appropriate style. Later on down in decide_unrolling(), we decide the number of iterations, and use desc->simple_p to ignore the loop if it is not simple. decide_unroll_constant_iterations (loop, flags); if (loop->lpt_decision.decision == LPT_NONE) decide_unroll_runtime_iterations (loop, flags); if (loop->lpt_decision.decision == LPT_NONE) decide_unroll_stupid (loop, flags); Any one of these functions will bail if the loop description was not simple_p: I don't think so. If that were the case, then we'd never unroll the loop in the testcase. In fact, decide_unroll_constant_iterations unrolls the loop with the complex exit. Now a problem I see here is that decide_unroll_*_iterations all call get_simple_loop_desc() which is basically LOOP_EXIT_SIMPLE, but since the value is already cached we return the previous call that was LOOP_EXIT_COMPLEX. So the code works because we will already have a cached value. Right. And ISTM it's the existence of the DESC structure that's the key here. Right? I think to make it clearer we could: 1. Add an assert in get_loop_desc to make sure that if we're returning a cached loop description, that the LOOP_EXIT_TYPEs match. Just in case... Probably not a bad idea. But you don't save the loop exit type AFAICT. In fact, if you did, you'd quickly see that the calls via decide_* would likely trigger your assert. 2. Change all the decide_unroll_*_iterations variants to specifically ask for a LOOP_EXIT_TYPE, not just the simple variant. And have this set to LOOP_EXIT_COMPLEX from decide_unrolling. Right now, this is all working because we have only one call to get_loop_location, but I assume that could change. But the simple variants all ask for LOOP_EXIT_SIMPLE, so I don't see this as improving things significantly. 3. And finally, what the heck is get_loop_location doing in cfgloop, when it's only used once within loop-unroll.c? I say we move it to loop-unroll.c and mark it static. Seems like a reasonable cleanup. Does this help? It does. But it really shows that the introduction of simple vs complex exits is messed up. All we've done is add another layer of complexity that happens to work IMHO, but it seems to do so more by accident than design. Jeff
Re: PR rtl-optimization/64081: Enable RTL loop unrolling for duplicated exit blocks and back edges (with AIX fixes)
On 02/13/2017 07:15 PM, Jeff Law wrote: So it seems in your updated patch there is only one call where we ask for LOOP_EXIT_COMPLEX, specifically the call from get_loop_location. But I don't see how asking for LOOP_EXIT_COMPLEX from that location would change whether or not we unroll any given loop (which is the core of bz64081). Am I missing something? Ughh, only the spaghetti that is this code? ;-). get_loop_location is only called once in the compiler, in decide_unrolling(). This call to get_loop_location() will set the loop description, particularly desc->simple_p where you point out. Later on down in decide_unrolling(), we decide the number of iterations, and use desc->simple_p to ignore the loop if it is not simple. decide_unroll_constant_iterations (loop, flags); if (loop->lpt_decision.decision == LPT_NONE) decide_unroll_runtime_iterations (loop, flags); if (loop->lpt_decision.decision == LPT_NONE) decide_unroll_stupid (loop, flags); Any one of these functions will bail if the loop description was not simple_p: /* Check for simple loops. */ desc = get_simple_loop_desc (loop); /* Check simpleness. */ if (!desc->simple_p || desc->assumptions) { if (dump_file) fprintf (dump_file, ";; Unable to prove that the number of iterations " "can be counted in runtime\n"); return; } (Yes, there's a lot of duplicated code in decide_unroll_*_iterations.) Now a problem I see here is that decide_unroll_*_iterations all call get_simple_loop_desc() which is basically LOOP_EXIT_SIMPLE, but since the value is already cached we return the previous call that was LOOP_EXIT_COMPLEX. So the code works because we will already have a cached value. I think to make it clearer we could: 1. Add an assert in get_loop_desc to make sure that if we're returning a cached loop description, that the LOOP_EXIT_TYPEs match. Just in case... 2. Change all the decide_unroll_*_iterations variants to specifically ask for a LOOP_EXIT_TYPE, not just the simple variant. And have this set to LOOP_EXIT_COMPLEX from decide_unrolling. Right now, this is all working because we have only one call to get_loop_location, but I assume that could change. 3. And finally, what the heck is get_loop_location doing in cfgloop, when it's only used once within loop-unroll.c? I say we move it to loop-unroll.c and mark it static. Does this help? Aldy
Re: PR rtl-optimization/64081: Enable RTL loop unrolling for duplicated exit blocks and back edges (with AIX fixes)
On 02/09/2017 11:08 AM, Aldy Hernandez wrote: For those of you not following the PR, this is a re-post of a patch that was approved in some form a year+ ago, but was reverted because it caused an undiagnosed bootstrap problem on AIX: https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00421.html Annoyingly, the AIX problem disappeared with some unrelated changes to SCCVN by Richard in more recent GCC's. Consequently, the patch got blocked because we could no longer reproduce the problem, but it wasn't entirely clear that it was gone. As penance for sins in a previous life, I have taken it upon myself to reproduce this problem back in time and find what caused the AIX failure back then. I will spare the list the boring process, but the problem with the original patch was that it changed the semantics of check_simple_exit, but not all of it's indirect callers. This caused two versions of the same loop to have different unsynchronized iteration variables-- one in a simple register, and one in a doloop variant on ppc. I have bootstrapped this patch around trunk@226811 on AIX (all languages), which is the latest time AIX failed to bootstrap with the original patch. My testing environment included a handful of other unrelated changes that were required to coerce AIX into building GCC5-ish with GCC6.1. I have also bootstrapped and tested this patch at today's trunk on x86-64 Linux with no adverse effects. As I've mentioned, I believe the original patch was previously approved, so to aid in reviewing I am including the full patch ("full-patch") with my changes on top of the original patch, as well as an incremental patch ("incremental-patch") with my recent changes to bring AIX to happiness. I understand if changes to the RTL looping infrastructure are too late for this release cycle, and as I am not the original author-- don't kill the messenger! I just want to move this forward, even if it means getting approval for GCC8. This will at the very least allow me to sleep at night, knowing I won't have to touch AIX for a while (or at least wrt to this PR) ;-). Original author(s) CC'ed. So it seems in your updated patch there is only one call where we ask for LOOP_EXIT_COMPLEX, specifically the call from get_loop_location. But I don't see how asking for LOOP_EXIT_COMPLEX from that location would change whether or not we unroll any given loop (which is the core of bz64081). Am I missing something? Jeff
PR rtl-optimization/64081: Enable RTL loop unrolling for duplicated exit blocks and back edges (with AIX fixes)
For those of you not following the PR, this is a re-post of a patch that was approved in some form a year+ ago, but was reverted because it caused an undiagnosed bootstrap problem on AIX: https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00421.html Annoyingly, the AIX problem disappeared with some unrelated changes to SCCVN by Richard in more recent GCC's. Consequently, the patch got blocked because we could no longer reproduce the problem, but it wasn't entirely clear that it was gone. As penance for sins in a previous life, I have taken it upon myself to reproduce this problem back in time and find what caused the AIX failure back then. I will spare the list the boring process, but the problem with the original patch was that it changed the semantics of check_simple_exit, but not all of it's indirect callers. This caused two versions of the same loop to have different unsynchronized iteration variables-- one in a simple register, and one in a doloop variant on ppc. I have bootstrapped this patch around trunk@226811 on AIX (all languages), which is the latest time AIX failed to bootstrap with the original patch. My testing environment included a handful of other unrelated changes that were required to coerce AIX into building GCC5-ish with GCC6.1. I have also bootstrapped and tested this patch at today's trunk on x86-64 Linux with no adverse effects. As I've mentioned, I believe the original patch was previously approved, so to aid in reviewing I am including the full patch ("full-patch") with my changes on top of the original patch, as well as an incremental patch ("incremental-patch") with my recent changes to bring AIX to happiness. I understand if changes to the RTL looping infrastructure are too late for this release cycle, and as I am not the original author-- don't kill the messenger! I just want to move this forward, even if it means getting approval for GCC8. This will at the very least allow me to sleep at night, knowing I won't have to touch AIX for a while (or at least wrt to this PR) ;-). Original author(s) CC'ed. Bring it! Aldy PR rtl-optimization/64081 * loop-iv.c (def_pred_latch_p): New function. (latch_dominating_def): Allow specific cases with non-single definitions. (iv_get_reaching_def): Likewise. (check_complex_exit_p): New function. (check_simple_exit): Use check_complex_exit_p to allow certain cases with exits not executing on any iteration. Rename to check_loop_exit. (find_simple_exit): Rename to find_loop_exit. (get_simple_loop_desc): Rename to get_loop_desc. * cfgloop.c (get_loop_location): Use get_loop_desc instead of get_simple_loop_desc. * cfgloop.h (enum loop_exit_type): New. (find_loop_exit): New prototype. (find_simple_exit): Define as inline to use find_loop_exit. (get_loop_desc): New prototype. (get_simple_loop_desc): Define as inline to use get_loop_desc. diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c index afd56bb..df9d394 100644 --- a/gcc/cfgloop.c +++ b/gcc/cfgloop.c @@ -1742,9 +1742,9 @@ get_loop_location (struct loop *loop) of the for or while statement, if possible. To do this, look for the branch guarding the loop back-edge. */ - /* If this is a simple loop with an in_edge, then the loop control - branch is typically at the end of its source. */ - desc = get_simple_loop_desc (loop); + /* If this is a loop with an in_edge, then the loop control branch + is typically at the end of its source. */ + desc = get_loop_desc (loop, LOOP_EXIT_COMPLEX); if (desc->in_edge) { FOR_BB_INSNS_REVERSE (desc->in_edge->src, insn) diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index 80a1915..583d247 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -460,6 +460,16 @@ struct GTY(()) niter_desc rtx niter_expr; }; +enum loop_exit_type +{ + /* Simple exit out of a loop. */ + LOOP_EXIT_SIMPLE, + + /* Simple, but slightly more complex loop exit sometimes allowed for + further loop analysis. See check_complex_exit_p. */ + LOOP_EXIT_COMPLEX +}; + extern void iv_analysis_loop_init (struct loop *); extern bool iv_analyze (rtx_insn *, rtx, struct rtx_iv *); extern bool iv_analyze_result (rtx_insn *, rtx, struct rtx_iv *); @@ -467,12 +477,30 @@ extern bool iv_analyze_expr (rtx_insn *, rtx, machine_mode, struct rtx_iv *); extern rtx get_iv_value (struct rtx_iv *, rtx); extern bool biv_p (rtx_insn *, rtx); -extern void find_simple_exit (struct loop *, struct niter_desc *); +extern void find_loop_exit (struct loop *, struct niter_desc *, + enum loop_exit_type); extern void iv_analysis_done (void); -extern struct niter_desc *get_simple_loop_desc (struct loop *loop); +extern struct niter_desc *get_loop_desc (struct loop *loop, +enum loop_exit_type); extern void