Re: [PATCH][RFC] Early phiopt pass
On Mon, 22 Oct 2018, Richard Biener wrote: > On Wed, 29 Aug 2018, Richard Biener wrote: > > > On Wed, 29 Aug 2018, Jeff Law wrote: > > > > > On 08/29/2018 04:56 AM, Richard Biener wrote: > > > > > > > > In response to PR87105 I dusted off an old patch that adds an early > > > > phiopt pass. Recognizing we run phiopt twice with not many passes > > > > in between early in post-IPA optimizations this patch moves the > > > > first of said to the early pipeline. > > > > > > > > The main motivation is to do things like MIN/MAX_EXPR early to > > > > avoid jump threading mess up the CFG (the case with PR87105). > > > > I realize theres early backward threading before the new early > > > > phiopt pass but that doesn't seem to do anything useful there (yet). > > > > I think it makes sense to push that later anyways. > > > > > > > > Now, early phiopt is quite confused about predict stmts still > > > > being present and turning conditional BBs into diamonds which it > > > > cannot handle. I've fixed at least stray such stmts in the BBs > > > > that are interesting. Note this may hide fallout which would otherwise > > > > be visible in the testsuite (there's no flag to avoid > > > > generating the predictors - they are emitted directly by the frontends, > > > > maybe we could drop them with -fno[-guess]-branch-probabilities at > > > > gimplification time?). > > > > > > > > There's also an effect on ifcombine which, when preceeded by phiopt, > > > > can miss cases because phiopt may value-replace some condition. > > > > > > > > The patch contains adjustments to testcases where there's no harm done > > > > in the end and leaves those FAILing where we would need to do sth. > > > > > > > > In the end it's regular pass-ordering issues but our testsuite very > > > > often ties our hands when re-ordering passes because of them. > > > > > > > > One option would be to distinguish early from late phiopt and for > > > > example avoid value-replacement - like just do MIN/MAX recognition > > > > for the vectorizer. > > > > > > > > Any comments? > > > > > > > > Some detailed notes on the remaining FAILs below. > > > [ ... ] > > > I didn't see anything in the testsuite fallout that gave me significant > > > concern. If your judgment is that we're better off running it earlier, > > > then let's do it. > > > > I guess so. I'll add an early variant anyway since we don't want to > > do adjacent load hoisting early. I'll see how difficult it is to handle > > ifcombine merging with straight-line code or catch those cases elsewhere. > > So I'm finally returning to this... > > I failed to find a sequence of VRP (jump threading), ifcombine and phiopt > that avoids regressions (well, the current order works of course). So > instead of moving the first phiopt pass I am now _adding_ a phiopt > pass early doing _only_ min/max/abs replacement. Those are wrecked > by jump-threading if they happen in sequences with some common operands. > > Thus the following patch is now in bootstrap & regtest on > x86_64-unknown-linux-gnu. > > If that goes well I plan to install the patch after digging out > the relevant two-to-three PRs and adding testcases for them. The following is what I have applied. Bootstrapped and tested on x86_64-unknown-linux-gnu. Richard. >From 83f89c82650bdfa057bde3ef1a236338a8cdd848 Mon Sep 17 00:00:00 2001 From: Richard Guenther Date: Tue, 28 Aug 2018 12:53:53 +0200 Subject: [PATCH] early-phiopt FAIL: g++.dg/predict-loop-exit-2.C -std=gnu++98 scan-tree-dump-times profile_es timate "loop exit heuristics:" 2 phiopt value-replaces the last "exit" test in : _5 = foo (); if (_5 != 0) goto ; else goto ; : g.0_7 = g; if (g.0_7 <= 9) goto ; else goto ; : : # iftmp.3_1 = PHI <1(5), 0(6), 1(4)> if (iftmp.3_1 != 0) goto ; else goto ; : return; to look like : g.0_7 = g; _6 = g.0_7 <= 9; : # iftmp.3_1 = PHI <_6(5), 1(4)> if (iftmp.3_1 != 0) goto ; which confuses whatever the testcase was supposed to check (there is only one loop exit). FAIL: gcc.dg/tree-ssa/ssa-pre-32.c scan-tree-dump pre "# prephitmp_[0-9]+ = PHI <[xy]_[0-9]+(D)[^,]*, [xy]_[0-9]+(D)" phiopt optimizes : if (b_5(D) != 0) goto ; [INV] else goto ; [INV] : : # iftmp.0_3 = PHI <4294967295(2), 0(3)> _1 = iftmp.0_3 & x_8(D); to _9 = (unsigned int) b_5(D); _14 = -_9; _1 = x_8(D) & _14; but in the sequence with two same conditions but opposite bits nothing optimizes this further back to b_5(D) ? x_6(D) : y_7(D). FAIL: gcc.dg/tree-ssa/ssa-ifcombine-7.c scan-tree-dump ifcombine " > " FAIL: gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c scan-tree-dump optimized "&" FAIL: gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c scan-tree-dump optimized "&" FAIL: gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c scan-tree-dump-times optimized "&" 2 \|" 2 early phiopt value-replaces the inner condition which leaves ifcombine with no work.
Re: [PATCH][RFC] Early phiopt pass
On Wed, 29 Aug 2018, Richard Biener wrote: > On Wed, 29 Aug 2018, Jeff Law wrote: > > > On 08/29/2018 04:56 AM, Richard Biener wrote: > > > > > > In response to PR87105 I dusted off an old patch that adds an early > > > phiopt pass. Recognizing we run phiopt twice with not many passes > > > in between early in post-IPA optimizations this patch moves the > > > first of said to the early pipeline. > > > > > > The main motivation is to do things like MIN/MAX_EXPR early to > > > avoid jump threading mess up the CFG (the case with PR87105). > > > I realize theres early backward threading before the new early > > > phiopt pass but that doesn't seem to do anything useful there (yet). > > > I think it makes sense to push that later anyways. > > > > > > Now, early phiopt is quite confused about predict stmts still > > > being present and turning conditional BBs into diamonds which it > > > cannot handle. I've fixed at least stray such stmts in the BBs > > > that are interesting. Note this may hide fallout which would otherwise > > > be visible in the testsuite (there's no flag to avoid > > > generating the predictors - they are emitted directly by the frontends, > > > maybe we could drop them with -fno[-guess]-branch-probabilities at > > > gimplification time?). > > > > > > There's also an effect on ifcombine which, when preceeded by phiopt, > > > can miss cases because phiopt may value-replace some condition. > > > > > > The patch contains adjustments to testcases where there's no harm done > > > in the end and leaves those FAILing where we would need to do sth. > > > > > > In the end it's regular pass-ordering issues but our testsuite very > > > often ties our hands when re-ordering passes because of them. > > > > > > One option would be to distinguish early from late phiopt and for > > > example avoid value-replacement - like just do MIN/MAX recognition > > > for the vectorizer. > > > > > > Any comments? > > > > > > Some detailed notes on the remaining FAILs below. > > [ ... ] > > I didn't see anything in the testsuite fallout that gave me significant > > concern. If your judgment is that we're better off running it earlier, > > then let's do it. > > I guess so. I'll add an early variant anyway since we don't want to > do adjacent load hoisting early. I'll see how difficult it is to handle > ifcombine merging with straight-line code or catch those cases elsewhere. So I'm finally returning to this... I failed to find a sequence of VRP (jump threading), ifcombine and phiopt that avoids regressions (well, the current order works of course). So instead of moving the first phiopt pass I am now _adding_ a phiopt pass early doing _only_ min/max/abs replacement. Those are wrecked by jump-threading if they happen in sequences with some common operands. Thus the following patch is now in bootstrap & regtest on x86_64-unknown-linux-gnu. If that goes well I plan to install the patch after digging out the relevant two-to-three PRs and adding testcases for them. Richard. 2018-10-22 Richard Biener * passes.def (pass_all_early_optimizations): Add early phi-opt after dce. * tree-ssa-phiopt.c (value_replacement): Ignore NOPs and predicts in addition to debug stmts. (tree_ssa_phiopt_worker): Add early_p argument, do only min/max and abs replacement early. * tree-cfg.c (gimple_empty_block_p): Likewise. * g++.dg/tree-ssa/pr21463.C: Scan phiopt2 because this testcase relies on phiprop run before. * g++.dg/tree-ssa/pr30738.C: Likewise. * g++.dg/tree-ssa/pr57380.C: Likewise. * gcc.dg/tree-ssa/pr84859.c: Likewise. * gcc.dg/tree-ssa/pr45397.c: Scan phiopt2 because phiopt1 is confused by copies in the IL left by EVRP. * gcc.dg/tree-ssa/phi-opt-5.c: Likewise, this time confused by predictors. * gcc.dg/tree-ssa/phi-opt-12.c: Scan phiopt2. diff --git a/gcc/passes.def b/gcc/passes.def index 7f4b3479a35..24f212c8e31 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -88,6 +88,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_dse); NEXT_PASS (pass_cd_dce); + NEXT_PASS (pass_phiopt, true /* early_p */); NEXT_PASS (pass_early_ipa_sra); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_convert_switch); @@ -208,7 +209,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_tree_ifcombine); NEXT_PASS (pass_merge_phi); - NEXT_PASS (pass_phiopt); + NEXT_PASS (pass_phiopt, false /* early_p */); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_ch); NEXT_PASS (pass_lower_complex); @@ -237,7 +238,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_reassoc, true /* insert_powi_p */); NEXT_PASS (pass_dce); NEXT_PASS (pass_forwprop); -
Re: [PATCH][RFC] Early phiopt pass
On Wed, 29 Aug 2018, Jeff Law wrote: > On 08/29/2018 04:56 AM, Richard Biener wrote: > > > > In response to PR87105 I dusted off an old patch that adds an early > > phiopt pass. Recognizing we run phiopt twice with not many passes > > in between early in post-IPA optimizations this patch moves the > > first of said to the early pipeline. > > > > The main motivation is to do things like MIN/MAX_EXPR early to > > avoid jump threading mess up the CFG (the case with PR87105). > > I realize theres early backward threading before the new early > > phiopt pass but that doesn't seem to do anything useful there (yet). > > I think it makes sense to push that later anyways. > > > > Now, early phiopt is quite confused about predict stmts still > > being present and turning conditional BBs into diamonds which it > > cannot handle. I've fixed at least stray such stmts in the BBs > > that are interesting. Note this may hide fallout which would otherwise > > be visible in the testsuite (there's no flag to avoid > > generating the predictors - they are emitted directly by the frontends, > > maybe we could drop them with -fno[-guess]-branch-probabilities at > > gimplification time?). > > > > There's also an effect on ifcombine which, when preceeded by phiopt, > > can miss cases because phiopt may value-replace some condition. > > > > The patch contains adjustments to testcases where there's no harm done > > in the end and leaves those FAILing where we would need to do sth. > > > > In the end it's regular pass-ordering issues but our testsuite very > > often ties our hands when re-ordering passes because of them. > > > > One option would be to distinguish early from late phiopt and for > > example avoid value-replacement - like just do MIN/MAX recognition > > for the vectorizer. > > > > Any comments? > > > > Some detailed notes on the remaining FAILs below. > [ ... ] > I didn't see anything in the testsuite fallout that gave me significant > concern. If your judgment is that we're better off running it earlier, > then let's do it. I guess so. I'll add an early variant anyway since we don't want to do adjacent load hoisting early. I'll see how difficult it is to handle ifcombine merging with straight-line code or catch those cases elsewhere. Richard.
Re: [PATCH][RFC] Early phiopt pass
On 08/29/2018 04:56 AM, Richard Biener wrote: > > In response to PR87105 I dusted off an old patch that adds an early > phiopt pass. Recognizing we run phiopt twice with not many passes > in between early in post-IPA optimizations this patch moves the > first of said to the early pipeline. > > The main motivation is to do things like MIN/MAX_EXPR early to > avoid jump threading mess up the CFG (the case with PR87105). > I realize theres early backward threading before the new early > phiopt pass but that doesn't seem to do anything useful there (yet). > I think it makes sense to push that later anyways. > > Now, early phiopt is quite confused about predict stmts still > being present and turning conditional BBs into diamonds which it > cannot handle. I've fixed at least stray such stmts in the BBs > that are interesting. Note this may hide fallout which would otherwise > be visible in the testsuite (there's no flag to avoid > generating the predictors - they are emitted directly by the frontends, > maybe we could drop them with -fno[-guess]-branch-probabilities at > gimplification time?). > > There's also an effect on ifcombine which, when preceeded by phiopt, > can miss cases because phiopt may value-replace some condition. > > The patch contains adjustments to testcases where there's no harm done > in the end and leaves those FAILing where we would need to do sth. > > In the end it's regular pass-ordering issues but our testsuite very > often ties our hands when re-ordering passes because of them. > > One option would be to distinguish early from late phiopt and for > example avoid value-replacement - like just do MIN/MAX recognition > for the vectorizer. > > Any comments? > > Some detailed notes on the remaining FAILs below. [ ... ] I didn't see anything in the testsuite fallout that gave me significant concern. If your judgment is that we're better off running it earlier, then let's do it. Jeff
Re: [PATCH][RFC] Early phiopt pass
> > In response to PR87105 I dusted off an old patch that adds an early > phiopt pass. Recognizing we run phiopt twice with not many passes > in between early in post-IPA optimizations this patch moves the > first of said to the early pipeline. > > The main motivation is to do things like MIN/MAX_EXPR early to > avoid jump threading mess up the CFG (the case with PR87105). > I realize theres early backward threading before the new early > phiopt pass but that doesn't seem to do anything useful there (yet). > I think it makes sense to push that later anyways. This looks like good plan overall. > > Now, early phiopt is quite confused about predict stmts still > being present and turning conditional BBs into diamonds which it > cannot handle. I've fixed at least stray such stmts in the BBs > that are interesting. Note this may hide fallout which would otherwise > be visible in the testsuite (there's no flag to avoid > generating the predictors - they are emitted directly by the frontends, > maybe we could drop them with -fno[-guess]-branch-probabilities at > gimplification time?). That would be fine with me. We could also just add guess-branch-probaiblities guards and not produce them from FE. > diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c > index cf12cb1f391..938ad7d9a44 100644 > --- a/gcc/tree-cfg.c > +++ b/gcc/tree-cfg.c > @@ -6106,11 +6106,19 @@ gimple_empty_block_p (basic_block bb) >gimple_stmt_iterator gsi = gsi_after_labels (bb); >if (phi_nodes (bb)) > return false; > - if (gsi_end_p (gsi)) > -return true; > - if (is_gimple_debug (gsi_stmt (gsi))) > -gsi_next_nondebug (); > - return gsi_end_p (gsi); > + while (!gsi_end_p (gsi)) > +{ > + gimple *stmt = gsi_stmt (gsi); > + if (is_gimple_debug (stmt)) > + ; > + else if (gimple_code (stmt) == GIMPLE_NOP > +|| gimple_code (stmt) == GIMPLE_PREDICT) > + ; > + else > + return false; > + gsi_next (); > +} > + return true; I think considering GIMPLE_PREDICT as empty basic blocks is going to make cfgcleanup to drop them in cases it should not (it will simply forward edges across them). I guess we could add new flag to gimple_empty_block_p whether to skip them or not defaulting to false. Once you ifconvert diamond you want to drop all predict statemetns inside of the diamond because they used to control the condtional of the diamond and now they control something else. Honza > } > > > diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c > index 1667bad873b..bbefc21dc13 100644 > --- a/gcc/tree-ssa-phiopt.c > +++ b/gcc/tree-ssa-phiopt.c > @@ -913,7 +913,9 @@ value_replacement (basic_block cond_bb, basic_block > middle_bb, >gsi_next_nondebug (); >if (!is_gimple_assign (stmt)) > { > - emtpy_or_with_defined_p = false; > + if (gimple_code (stmt) != GIMPLE_PREDICT > + && gimple_code (stmt) != GIMPLE_NOP) > + emtpy_or_with_defined_p = false; > continue; > } >/* Now try to adjust arg0 or arg1 according to the computation
[PATCH][RFC] Early phiopt pass
In response to PR87105 I dusted off an old patch that adds an early phiopt pass. Recognizing we run phiopt twice with not many passes in between early in post-IPA optimizations this patch moves the first of said to the early pipeline. The main motivation is to do things like MIN/MAX_EXPR early to avoid jump threading mess up the CFG (the case with PR87105). I realize theres early backward threading before the new early phiopt pass but that doesn't seem to do anything useful there (yet). I think it makes sense to push that later anyways. Now, early phiopt is quite confused about predict stmts still being present and turning conditional BBs into diamonds which it cannot handle. I've fixed at least stray such stmts in the BBs that are interesting. Note this may hide fallout which would otherwise be visible in the testsuite (there's no flag to avoid generating the predictors - they are emitted directly by the frontends, maybe we could drop them with -fno[-guess]-branch-probabilities at gimplification time?). There's also an effect on ifcombine which, when preceeded by phiopt, can miss cases because phiopt may value-replace some condition. The patch contains adjustments to testcases where there's no harm done in the end and leaves those FAILing where we would need to do sth. In the end it's regular pass-ordering issues but our testsuite very often ties our hands when re-ordering passes because of them. One option would be to distinguish early from late phiopt and for example avoid value-replacement - like just do MIN/MAX recognition for the vectorizer. Any comments? Some detailed notes on the remaining FAILs below. Bootstrapped and tested on x86_64-unknown-linux-gnu. >From e4f06ed39d12e7c17ff57aa17f17f54013d7869f Mon Sep 17 00:00:00 2001 From: Richard Guenther Date: Tue, 28 Aug 2018 12:53:53 +0200 Subject: [PATCH] early-phiopt FAIL: g++.dg/predict-loop-exit-2.C -std=gnu++98 scan-tree-dump-times profile_es timate "loop exit heuristics:" 2 phiopt value-replaces the last "exit" test in : _5 = foo (); if (_5 != 0) goto ; else goto ; : g.0_7 = g; if (g.0_7 <= 9) goto ; else goto ; : : # iftmp.3_1 = PHI <1(5), 0(6), 1(4)> if (iftmp.3_1 != 0) goto ; else goto ; : return; to look like : g.0_7 = g; _6 = g.0_7 <= 9; : # iftmp.3_1 = PHI <_6(5), 1(4)> if (iftmp.3_1 != 0) goto ; which confuses whatever the testcase was supposed to check (there is only one loop exit). FAIL: gcc.dg/tree-ssa/ssa-pre-32.c scan-tree-dump pre "# prephitmp_[0-9]+ = PHI <[xy]_[0-9]+(D)[^,]*, [xy]_[0-9]+(D)" phiopt optimizes : if (b_5(D) != 0) goto ; [INV] else goto ; [INV] : : # iftmp.0_3 = PHI <4294967295(2), 0(3)> _1 = iftmp.0_3 & x_8(D); to _9 = (unsigned int) b_5(D); _14 = -_9; _1 = x_8(D) & _14; but in the sequence with two same conditions but opposite bits nothing optimizes this further back to b_5(D) ? x_6(D) : y_7(D). FAIL: gcc.dg/tree-ssa/ssa-ifcombine-7.c scan-tree-dump ifcombine " > " FAIL: gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c scan-tree-dump optimized "&" FAIL: gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c scan-tree-dump optimized "&" FAIL: gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c scan-tree-dump-times optimized "&" 2 \|" 2 early phiopt value-replaces the inner condition which leaves ifcombine with no work. 2018-08-28 Richard Biener * passes.def (pass_all_early_optimizations): Add phi-opt after dce. (pass_all_optimizations): Remove first phi-opt pass. * tree-ssa-phiopt.c (value_replacement): Ignore NOPs and predicts in addition to debug stmts. * tree-cfg.c (gimple_empty_block_p): Likewise. * g++.dg/tree-ssa/pr21463.C: Scan phiopt2 because this testcase relies on phiprop run before. * g++.dg/tree-ssa/pr30738.C: Likewise. * g++.dg/tree-ssa/pr57380.C: Likewise. * gcc.dg/tree-ssa/pr84859.c: Likewise. * gcc.dg/tree-ssa/vrp33.c: Disable phiopt. * gcc.dg/tree-ssa/pr68198.c: Likewise. * gcc.dg/tree-ssa/pr20701.c: Likewise. * gcc.dg/tree-ssa/pr21001.c: Likewise. * gcc.dg/tree-ssa/pr21563.c: Likewise. * gcc.dg/tree-ssa/pr37508.c: Likewise. * gcc.dg/tree-ssa/vrp19.c: Likewise. * gcc.dg/tree-ssa/vrp20.c: Likewise. * gcc.dg/tree-ssa/pr45397.c: Scan phiopt2 because phiopt1 is confused by copies in the IL left by EVRP. * gcc.dg/tree-ssa/phi-opt-12.c: Likewise, this time confused by predictors. * gcc.dg/tree-ssa/phi-opt-5.c: Likewise. * gcc.dg/pr24564.c: Likewise. * gcc.dg/uninit-15.c: Switch to testing other OK alternative. diff --git a/gcc/passes.def b/gcc/passes.def index 7f4b3479a35..401d1307125 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -88,6 +88,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_merge_phi); NEXT_PASS