Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
On 8/2/2022 7:00 AM, Richard Biener via Gcc-patches wrote: I am trying to make sense of back_threader_profitability::profitable_path_p and the first thing I notice is that we do /* Threading is profitable if the path duplicated is hot but also in a case we separate cold path from hot path and permit optimization of the hot path later. Be on the agressive side here. In some testcases, as in PR 78407 this leads to noticeable improvements. */ if (m_speed_p && ((taken_edge && optimize_edge_for_speed_p (taken_edge)) || contains_hot_bb)) { if (n_insns >= param_max_fsm_thread_path_insns) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " "the number of instructions on the path " "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n"); return false; } ... } else if (!m_speed_p && n_insns > 1) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " "duplication of %i insns is needed and optimizing for size.\n", n_insns); return false; } ... return true; thus we apply the n_insns >= param_max_fsm_thread_path_insns only to "hot paths". The comment above this isn't entirely clear whether this is by design ("Be on the aggressive side here ...") but I think this is a mistake. In fact the "hot path" check seems entirely useless since if the path is not hot we simply continue threading it. I think the intent here was to allow more insns to be copied if the path was hot than if it was not not hot. But the logic seems a bit convoluted here. jeff
Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
On Tue, 2 Aug 2022, Jan Hubicka wrote: > > On Tue, 2 Aug 2022, Aldy Hernandez wrote: > > > > > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener wrote: > > > > > > > > On Tue, 2 Aug 2022, Aldy Hernandez wrote: > > > > > > > > > Unfortunately, this was before my time, so I don't know. > > > > > > > > > > That being said, thanks for tackling these issues that my work > > > > > triggered last release. Much appreciated. > > > > > > > > Ah. But it was your r12-324-g69e5544210e3c0 that did > > > > > > > > - else if (n_insns > 1) > > > > + else if (!m_speed_p && n_insns > 1) > > > > > > > > causing the breakage on the 12 branch. That leads to a simpler > > > > fix I guess. Will re-test and also backport to GCC 12 if successful. > > > > > > Huh. It's been a while, but that looks like a typo. That patch was > > > supposed to be non-behavior changing. > > > > Exactly my thinking so reverting it shouldn't be a reason for > > detailed questions. Now, the contains_hot_bb computation is, > > that one was introduced by Honza in r7-6476-g0f0c2cc3a17efa > > together with the comment and a testcase. > > > > So - Honza, what was the reasoning to look at raw BB counts here > > rather than for example the path entry edge count? > I think the explanation is in the final comment: > /* Threading is profitable if the path duplicated is hot but also > in a case we separate cold path from hot path and permit ptimization > of the hot path later. Be on the agressive side here. In some estcases, > as in PR 78407 this leads to noticeable improvements. */ > > If you have non-threadable hot path threading out cold paths will make > it easier to be optimized since you have fewer meets in the dataflow. I see. It does seem that it would be better handled by tracing the hot path but of course threading the cold path might be cheaper. That said, the cost modeling checks hotness of the threaded path by looking at its exit edge (the one we can optimize statically) but shouldn't hotness of the threaded path be determined by looking at the path entry edge count instead? With inconsistent guessed profile (inconsistent now that we can statically compute the outgoing branch on one of the paths) it's of course all GIGO, but ... I'll leave this alone for now. Still we now have the exceptions of predicted never executed entry/exit to avoid diagnostic fallout. Richard.
Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
> On Tue, 2 Aug 2022, Aldy Hernandez wrote: > > > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener wrote: > > > > > > On Tue, 2 Aug 2022, Aldy Hernandez wrote: > > > > > > > Unfortunately, this was before my time, so I don't know. > > > > > > > > That being said, thanks for tackling these issues that my work > > > > triggered last release. Much appreciated. > > > > > > Ah. But it was your r12-324-g69e5544210e3c0 that did > > > > > > - else if (n_insns > 1) > > > + else if (!m_speed_p && n_insns > 1) > > > > > > causing the breakage on the 12 branch. That leads to a simpler > > > fix I guess. Will re-test and also backport to GCC 12 if successful. > > > > Huh. It's been a while, but that looks like a typo. That patch was > > supposed to be non-behavior changing. > > Exactly my thinking so reverting it shouldn't be a reason for > detailed questions. Now, the contains_hot_bb computation is, > that one was introduced by Honza in r7-6476-g0f0c2cc3a17efa > together with the comment and a testcase. > > So - Honza, what was the reasoning to look at raw BB counts here > rather than for example the path entry edge count? I think the explanation is in the final comment: /* Threading is profitable if the path duplicated is hot but also in a case we separate cold path from hot path and permit ptimization of the hot path later. Be on the agressive side here. In some estcases, as in PR 78407 this leads to noticeable improvements. */ If you have non-threadable hot path threading out cold paths will make it easier to be optimized since you have fewer meets in the dataflow. Honza > > Richard.
Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
Feel free to blame me for everything except the profitability code and the generic block copier. That stuff was all there before and I mostly avoided it. :-) Thanks for the work in this space. Aldy On Tue, Aug 2, 2022, 15:29 Richard Biener wrote: > On Tue, 2 Aug 2022, Aldy Hernandez wrote: > > > On Tue, Aug 2, 2022 at 1:59 PM Richard Biener wrote: > > > > > > On Tue, 2 Aug 2022, Aldy Hernandez wrote: > > > > > > > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener > wrote: > > > > > > > > > > On Tue, 2 Aug 2022, Aldy Hernandez wrote: > > > > > > > > > > > Unfortunately, this was before my time, so I don't know. > > > > > > > > > > > > That being said, thanks for tackling these issues that my work > > > > > > triggered last release. Much appreciated. > > > > > > > > > > Ah. But it was your r12-324-g69e5544210e3c0 that did > > > > > > > > > > - else if (n_insns > 1) > > > > > + else if (!m_speed_p && n_insns > 1) > > > > > > > > > > causing the breakage on the 12 branch. That leads to a simpler > > > > > fix I guess. Will re-test and also backport to GCC 12 if > successful. > > > > > > > > Huh. It's been a while, but that looks like a typo. That patch was > > > > supposed to be non-behavior changing. > > > > > > Exactly my thinking so reverting it shouldn't be a reason for > > > detailed questions. Now, the contains_hot_bb computation is, > > > > Sorry for the pain. > > So - actually the change was probably done on purpose (even if > reverting - which I've now already one - caused no testsuite regressions). > That's because the whole function is invoked N + 1 times for a path > of length N and we definitely want to avoid using the size optimization > heuristics when the path is not complete yet. I think the proper > way is to do > > diff --git a/gcc/tree-ssa-threadbackward.cc > b/gcc/tree-ssa-threadbackward.cc > index ba114e98a41..6979398ef76 100644 > --- a/gcc/tree-ssa-threadbackward.cc > +++ b/gcc/tree-ssa-threadbackward.cc > @@ -767,7 +767,11 @@ back_threader_profitability::profitable_path_p (const > vec _path, > as in PR 78407 this leads to noticeable improvements. */ >if (m_speed_p >&& ((taken_edge && optimize_edge_for_speed_p (taken_edge)) > - || contains_hot_bb)) > + || contains_hot_bb > + /* Avoid using the size heuristics when not doing the final > +thread evaluation, we get called for each added BB > +to the path. */ > + || !taken_edge)) > { >if (n_insns >= param_max_fsm_thread_path_insns) > { > > thus assume there'll be a hot BB in the future. > > That said, the very best fix would be to not call this function > N + 1 times (I have a patch to call it only N times - yay), but > instead factor out parts to be called per BB plus keeping enough > state so we can incrementally collect info. > > There's more "odd" things in the backward threader, of course :/ > > I'm looking for things applicable to the GCC 12 branch right now > so will try the above. > > Richard. > >
Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
On Tue, 2 Aug 2022, Aldy Hernandez wrote: > On Tue, Aug 2, 2022 at 1:59 PM Richard Biener wrote: > > > > On Tue, 2 Aug 2022, Aldy Hernandez wrote: > > > > > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener wrote: > > > > > > > > On Tue, 2 Aug 2022, Aldy Hernandez wrote: > > > > > > > > > Unfortunately, this was before my time, so I don't know. > > > > > > > > > > That being said, thanks for tackling these issues that my work > > > > > triggered last release. Much appreciated. > > > > > > > > Ah. But it was your r12-324-g69e5544210e3c0 that did > > > > > > > > - else if (n_insns > 1) > > > > + else if (!m_speed_p && n_insns > 1) > > > > > > > > causing the breakage on the 12 branch. That leads to a simpler > > > > fix I guess. Will re-test and also backport to GCC 12 if successful. > > > > > > Huh. It's been a while, but that looks like a typo. That patch was > > > supposed to be non-behavior changing. > > > > Exactly my thinking so reverting it shouldn't be a reason for > > detailed questions. Now, the contains_hot_bb computation is, > > Sorry for the pain. So - actually the change was probably done on purpose (even if reverting - which I've now already one - caused no testsuite regressions). That's because the whole function is invoked N + 1 times for a path of length N and we definitely want to avoid using the size optimization heuristics when the path is not complete yet. I think the proper way is to do diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc index ba114e98a41..6979398ef76 100644 --- a/gcc/tree-ssa-threadbackward.cc +++ b/gcc/tree-ssa-threadbackward.cc @@ -767,7 +767,11 @@ back_threader_profitability::profitable_path_p (const vec _path, as in PR 78407 this leads to noticeable improvements. */ if (m_speed_p && ((taken_edge && optimize_edge_for_speed_p (taken_edge)) - || contains_hot_bb)) + || contains_hot_bb + /* Avoid using the size heuristics when not doing the final +thread evaluation, we get called for each added BB +to the path. */ + || !taken_edge)) { if (n_insns >= param_max_fsm_thread_path_insns) { thus assume there'll be a hot BB in the future. That said, the very best fix would be to not call this function N + 1 times (I have a patch to call it only N times - yay), but instead factor out parts to be called per BB plus keeping enough state so we can incrementally collect info. There's more "odd" things in the backward threader, of course :/ I'm looking for things applicable to the GCC 12 branch right now so will try the above. Richard.
[PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
I am trying to make sense of back_threader_profitability::profitable_path_p and the first thing I notice is that we do /* Threading is profitable if the path duplicated is hot but also in a case we separate cold path from hot path and permit optimization of the hot path later. Be on the agressive side here. In some testcases, as in PR 78407 this leads to noticeable improvements. */ if (m_speed_p && ((taken_edge && optimize_edge_for_speed_p (taken_edge)) || contains_hot_bb)) { if (n_insns >= param_max_fsm_thread_path_insns) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " "the number of instructions on the path " "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n"); return false; } ... } else if (!m_speed_p && n_insns > 1) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " "duplication of %i insns is needed and optimizing for size.\n", n_insns); return false; } ... return true; thus we apply the n_insns >= param_max_fsm_thread_path_insns only to "hot paths". The comment above this isn't entirely clear whether this is by design ("Be on the aggressive side here ...") but I think this is a mistake. In fact the "hot path" check seems entirely useless since if the path is not hot we simply continue threading it. This was caused by r12-324-g69e5544210e3c0 and the following simply reverts the offending change. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. * tree-ssa-threadbackwards.cc (back_threader_profitability::profitable_path_p): Apply size constraints to all paths again. --- gcc/tree-ssa-threadbackward.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc index 0519f2a8c4b..ba114e98a41 100644 --- a/gcc/tree-ssa-threadbackward.cc +++ b/gcc/tree-ssa-threadbackward.cc @@ -794,7 +794,7 @@ back_threader_profitability::profitable_path_p (const vec _path, return false; } } - else if (!m_speed_p && n_insns > 1) + else if (n_insns > 1) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " -- 2.35.3
Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
On Tue, Aug 2, 2022 at 1:59 PM Richard Biener wrote: > > On Tue, 2 Aug 2022, Aldy Hernandez wrote: > > > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener wrote: > > > > > > On Tue, 2 Aug 2022, Aldy Hernandez wrote: > > > > > > > Unfortunately, this was before my time, so I don't know. > > > > > > > > That being said, thanks for tackling these issues that my work > > > > triggered last release. Much appreciated. > > > > > > Ah. But it was your r12-324-g69e5544210e3c0 that did > > > > > > - else if (n_insns > 1) > > > + else if (!m_speed_p && n_insns > 1) > > > > > > causing the breakage on the 12 branch. That leads to a simpler > > > fix I guess. Will re-test and also backport to GCC 12 if successful. > > > > Huh. It's been a while, but that looks like a typo. That patch was > > supposed to be non-behavior changing. > > Exactly my thinking so reverting it shouldn't be a reason for > detailed questions. Now, the contains_hot_bb computation is, Sorry for the pain. Thanks for taking care of this. Aldy > that one was introduced by Honza in r7-6476-g0f0c2cc3a17efa > together with the comment and a testcase. > > So - Honza, what was the reasoning to look at raw BB counts here > rather than for example the path entry edge count? > > Richard. >
Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
On Tue, 2 Aug 2022, Aldy Hernandez wrote: > On Tue, Aug 2, 2022 at 1:45 PM Richard Biener wrote: > > > > On Tue, 2 Aug 2022, Aldy Hernandez wrote: > > > > > Unfortunately, this was before my time, so I don't know. > > > > > > That being said, thanks for tackling these issues that my work > > > triggered last release. Much appreciated. > > > > Ah. But it was your r12-324-g69e5544210e3c0 that did > > > > - else if (n_insns > 1) > > + else if (!m_speed_p && n_insns > 1) > > > > causing the breakage on the 12 branch. That leads to a simpler > > fix I guess. Will re-test and also backport to GCC 12 if successful. > > Huh. It's been a while, but that looks like a typo. That patch was > supposed to be non-behavior changing. Exactly my thinking so reverting it shouldn't be a reason for detailed questions. Now, the contains_hot_bb computation is, that one was introduced by Honza in r7-6476-g0f0c2cc3a17efa together with the comment and a testcase. So - Honza, what was the reasoning to look at raw BB counts here rather than for example the path entry edge count? Richard.
Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
On Tue, Aug 2, 2022 at 1:45 PM Richard Biener wrote: > > On Tue, 2 Aug 2022, Aldy Hernandez wrote: > > > Unfortunately, this was before my time, so I don't know. > > > > That being said, thanks for tackling these issues that my work > > triggered last release. Much appreciated. > > Ah. But it was your r12-324-g69e5544210e3c0 that did > > - else if (n_insns > 1) > + else if (!m_speed_p && n_insns > 1) > > causing the breakage on the 12 branch. That leads to a simpler > fix I guess. Will re-test and also backport to GCC 12 if successful. Huh. It's been a while, but that looks like a typo. That patch was supposed to be non-behavior changing. Aldy
Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
On Tue, 2 Aug 2022, Aldy Hernandez wrote: > Unfortunately, this was before my time, so I don't know. > > That being said, thanks for tackling these issues that my work > triggered last release. Much appreciated. Ah. But it was your r12-324-g69e5544210e3c0 that did - else if (n_insns > 1) + else if (!m_speed_p && n_insns > 1) causing the breakage on the 12 branch. That leads to a simpler fix I guess. Will re-test and also backport to GCC 12 if successful. Richard. > Aldy > > On Tue, Aug 2, 2022 at 10:41 AM Richard Biener wrote: > > > > I am trying to make sense of back_threader_profitability::profitable_path_p > > and the first thing I notice is that we do > > > > /* Threading is profitable if the path duplicated is hot but also > > in a case we separate cold path from hot path and permit optimization > > of the hot path later. Be on the agressive side here. In some > > testcases, > > as in PR 78407 this leads to noticeable improvements. */ > > if (m_speed_p > > && ((taken_edge && optimize_edge_for_speed_p (taken_edge)) > > || contains_hot_bb)) > > { > > if (n_insns >= param_max_fsm_thread_path_insns) > > { > > if (dump_file && (dump_flags & TDF_DETAILS)) > > fprintf (dump_file, " FAIL: Jump-thread path not considered: " > > "the number of instructions on the path " > > "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n"); > > return false; > > } > > ... > > } > > else if (!m_speed_p && n_insns > 1) > > { > > if (dump_file && (dump_flags & TDF_DETAILS)) > > fprintf (dump_file, " FAIL: Jump-thread path not considered: " > > "duplication of %i insns is needed and optimizing for > > size.\n", > > n_insns); > > return false; > > } > > ... > > return true; > > > > thus we apply the n_insns >= param_max_fsm_thread_path_insns only > > to "hot paths". The comment above this isn't entirely clear whether > > this is by design ("Be on the aggressive side here ...") but I think > > this is a mistake. In fact the "hot path" check seems entirely > > useless since if the path is not hot we simply continue threading it. > > > > I have my reservations about how we compute hot (contains_hot_bb > > in particular), but the following first refactors the above to apply > > the size constraints always and then _not_ threading if the path > > is not considered hot (but allow threading if n_insns <= 1 as with > > the !m_speed_p case). > > > > As for contains_hot_bb - it might be that this consciously captures > > the case where we separate a cold from a hot path even though the > > threaded path itself is cold. Consider > > > >A > > / \ (unlikely) > > B C > > \ / > >D > > / \ > > .. abort() > > > > when we want to thread A -> B -> D -> abort () and A (or D) > > has a hot BB count then we have contains_hot_bb even though > > the counts on the path itself are small. In fact when we > > thread the only relevant count for the resulting threaded > > path is the count of A with the A->C probability applied > > (that should also the count to subtract from the blocks > > we copied - sth missing for the backwards threader as well). > > > > So I'm wondering how the logic computing contains_hot_bb > > relates to the above comment before the costing block. > > Anyone remembers? > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > > > * tree-ssa-threadbackwards.cc > > (back_threader_profitability::profitable_path_p): Apply > > size constraints to all paths. Do not thread cold paths. > > --- > > gcc/tree-ssa-threadbackward.cc | 53 +- > > 1 file changed, 33 insertions(+), 20 deletions(-) > > > > diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc > > index 0519f2a8c4b..a887568032b 100644 > > --- a/gcc/tree-ssa-threadbackward.cc > > +++ b/gcc/tree-ssa-threadbackward.cc > > @@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const > > vec _path, > > *creates_irreducible_loop = true; > > } > > > > - /* Threading is profitable if the path duplicated is hot but also > > + const int max_cold_insns = 1; > > + if (!m_speed_p && n_insns > max_cold_insns) > > +{ > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > + fprintf (dump_file, " FAIL: Jump-thread path not considered: " > > +"duplication of %i insns is needed and optimizing for > > size.\n", > > +n_insns); > > + return false; > > +} > > + else if (n_insns >= param_max_fsm_thread_path_insns) > > +{ > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > + fprintf (dump_file, " FAIL: Jump-thread path not considered: " > > +"the number of instructions on the path " > > +"exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n"); > > +
Re: [PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
Unfortunately, this was before my time, so I don't know. That being said, thanks for tackling these issues that my work triggered last release. Much appreciated. Aldy On Tue, Aug 2, 2022 at 10:41 AM Richard Biener wrote: > > I am trying to make sense of back_threader_profitability::profitable_path_p > and the first thing I notice is that we do > > /* Threading is profitable if the path duplicated is hot but also > in a case we separate cold path from hot path and permit optimization > of the hot path later. Be on the agressive side here. In some testcases, > as in PR 78407 this leads to noticeable improvements. */ > if (m_speed_p > && ((taken_edge && optimize_edge_for_speed_p (taken_edge)) > || contains_hot_bb)) > { > if (n_insns >= param_max_fsm_thread_path_insns) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, " FAIL: Jump-thread path not considered: " > "the number of instructions on the path " > "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n"); > return false; > } > ... > } > else if (!m_speed_p && n_insns > 1) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, " FAIL: Jump-thread path not considered: " > "duplication of %i insns is needed and optimizing for > size.\n", > n_insns); > return false; > } > ... > return true; > > thus we apply the n_insns >= param_max_fsm_thread_path_insns only > to "hot paths". The comment above this isn't entirely clear whether > this is by design ("Be on the aggressive side here ...") but I think > this is a mistake. In fact the "hot path" check seems entirely > useless since if the path is not hot we simply continue threading it. > > I have my reservations about how we compute hot (contains_hot_bb > in particular), but the following first refactors the above to apply > the size constraints always and then _not_ threading if the path > is not considered hot (but allow threading if n_insns <= 1 as with > the !m_speed_p case). > > As for contains_hot_bb - it might be that this consciously captures > the case where we separate a cold from a hot path even though the > threaded path itself is cold. Consider > >A > / \ (unlikely) > B C > \ / >D > / \ > .. abort() > > when we want to thread A -> B -> D -> abort () and A (or D) > has a hot BB count then we have contains_hot_bb even though > the counts on the path itself are small. In fact when we > thread the only relevant count for the resulting threaded > path is the count of A with the A->C probability applied > (that should also the count to subtract from the blocks > we copied - sth missing for the backwards threader as well). > > So I'm wondering how the logic computing contains_hot_bb > relates to the above comment before the costing block. > Anyone remembers? > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > * tree-ssa-threadbackwards.cc > (back_threader_profitability::profitable_path_p): Apply > size constraints to all paths. Do not thread cold paths. > --- > gcc/tree-ssa-threadbackward.cc | 53 +- > 1 file changed, 33 insertions(+), 20 deletions(-) > > diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc > index 0519f2a8c4b..a887568032b 100644 > --- a/gcc/tree-ssa-threadbackward.cc > +++ b/gcc/tree-ssa-threadbackward.cc > @@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const > vec _path, > *creates_irreducible_loop = true; > } > > - /* Threading is profitable if the path duplicated is hot but also > + const int max_cold_insns = 1; > + if (!m_speed_p && n_insns > max_cold_insns) > +{ > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, " FAIL: Jump-thread path not considered: " > +"duplication of %i insns is needed and optimizing for > size.\n", > +n_insns); > + return false; > +} > + else if (n_insns >= param_max_fsm_thread_path_insns) > +{ > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, " FAIL: Jump-thread path not considered: " > +"the number of instructions on the path " > +"exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n"); > + return false; > +} > + > + /* Threading is profitable if the path duplicated is small or hot but also > in a case we separate cold path from hot path and permit optimization > of the hot path later. Be on the agressive side here. In some > testcases, > as in PR 78407 this leads to noticeable improvements. */ > - if (m_speed_p > - && ((taken_edge && optimize_edge_for_speed_p (taken_edge)) > - || contains_hot_bb)) > + if (!(n_insns <= max_cold_insns > + || contains_hot_bb > + ||
[PATCH] Properly honor param_max_fsm_thread_path_insns in backwards threader
I am trying to make sense of back_threader_profitability::profitable_path_p and the first thing I notice is that we do /* Threading is profitable if the path duplicated is hot but also in a case we separate cold path from hot path and permit optimization of the hot path later. Be on the agressive side here. In some testcases, as in PR 78407 this leads to noticeable improvements. */ if (m_speed_p && ((taken_edge && optimize_edge_for_speed_p (taken_edge)) || contains_hot_bb)) { if (n_insns >= param_max_fsm_thread_path_insns) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " "the number of instructions on the path " "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n"); return false; } ... } else if (!m_speed_p && n_insns > 1) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " FAIL: Jump-thread path not considered: " "duplication of %i insns is needed and optimizing for size.\n", n_insns); return false; } ... return true; thus we apply the n_insns >= param_max_fsm_thread_path_insns only to "hot paths". The comment above this isn't entirely clear whether this is by design ("Be on the aggressive side here ...") but I think this is a mistake. In fact the "hot path" check seems entirely useless since if the path is not hot we simply continue threading it. I have my reservations about how we compute hot (contains_hot_bb in particular), but the following first refactors the above to apply the size constraints always and then _not_ threading if the path is not considered hot (but allow threading if n_insns <= 1 as with the !m_speed_p case). As for contains_hot_bb - it might be that this consciously captures the case where we separate a cold from a hot path even though the threaded path itself is cold. Consider A / \ (unlikely) B C \ / D / \ .. abort() when we want to thread A -> B -> D -> abort () and A (or D) has a hot BB count then we have contains_hot_bb even though the counts on the path itself are small. In fact when we thread the only relevant count for the resulting threaded path is the count of A with the A->C probability applied (that should also the count to subtract from the blocks we copied - sth missing for the backwards threader as well). So I'm wondering how the logic computing contains_hot_bb relates to the above comment before the costing block. Anyone remembers? Bootstrap & regtest running on x86_64-unknown-linux-gnu. * tree-ssa-threadbackwards.cc (back_threader_profitability::profitable_path_p): Apply size constraints to all paths. Do not thread cold paths. --- gcc/tree-ssa-threadbackward.cc | 53 +- 1 file changed, 33 insertions(+), 20 deletions(-) diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc index 0519f2a8c4b..a887568032b 100644 --- a/gcc/tree-ssa-threadbackward.cc +++ b/gcc/tree-ssa-threadbackward.cc @@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const vec _path, *creates_irreducible_loop = true; } - /* Threading is profitable if the path duplicated is hot but also + const int max_cold_insns = 1; + if (!m_speed_p && n_insns > max_cold_insns) +{ + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " FAIL: Jump-thread path not considered: " +"duplication of %i insns is needed and optimizing for size.\n", +n_insns); + return false; +} + else if (n_insns >= param_max_fsm_thread_path_insns) +{ + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " FAIL: Jump-thread path not considered: " +"the number of instructions on the path " +"exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n"); + return false; +} + + /* Threading is profitable if the path duplicated is small or hot but also in a case we separate cold path from hot path and permit optimization of the hot path later. Be on the agressive side here. In some testcases, as in PR 78407 this leads to noticeable improvements. */ - if (m_speed_p - && ((taken_edge && optimize_edge_for_speed_p (taken_edge)) - || contains_hot_bb)) + if (!(n_insns <= max_cold_insns + || contains_hot_bb + || (taken_edge && optimize_edge_for_speed_p (taken_edge +{ + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " FAIL: Jump-thread path not considered: " +"path is not profitable to thread.\n"); + return false; +} + + /* If the path is not small to duplicate and either the entry or + the final destination is probably never executed avoid separating + the cold path since that can lead