Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
On 8/9/19 2:51 PM, Martin Liška wrote: > On 8/9/19 2:13 PM, Michael Matz wrote: >> Hi, >> >> On Fri, 9 Aug 2019, Richard Biener wrote: >> >>> Of course I'm still afraid that the other code exists for a reason >>> (tuning/hack/whatever...). >>> >>> Note that with the patch we're now applying LOOP_ALIGN to L2 here: >>> if (a) >>> foo = bar; >>> L2: >>> blah; >>> >>> because there's a jump-around and a fallthru. >> >> Yeah, and I think that would be wrong. That's why the existing code (not >> sure about after the patch) does this only when L2 is reached by one edge >> much more often than by the other edges. >> >>> So I'm not sure we don't need to apply some condition on fallthru_count >>> (which is unused after your patch btw). >> >> >> Ciao, >> Michael. >> > > I'm sending numbers for the opposite condition. > >> Of course I'm still afraid that the other code exists for a reason >> (tuning/hack/whatever...). > > I fully agree that the current code is quite hacking and was probably subject > of some tuning. > > I'm leaving the decision about simplification to you? > You're much more experienced in the area :) > > Martin > @Honza: PING Martin pEpkey.asc Description: application/pgp-keys
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
On 8/9/19 2:13 PM, Michael Matz wrote: > Hi, > > On Fri, 9 Aug 2019, Richard Biener wrote: > >> Of course I'm still afraid that the other code exists for a reason >> (tuning/hack/whatever...). >> >> Note that with the patch we're now applying LOOP_ALIGN to L2 here: >> if (a) >> foo = bar; >> L2: >> blah; >> >> because there's a jump-around and a fallthru. > > Yeah, and I think that would be wrong. That's why the existing code (not > sure about after the patch) does this only when L2 is reached by one edge > much more often than by the other edges. > >> So I'm not sure we don't need to apply some condition on fallthru_count >> (which is unused after your patch btw). > > > Ciao, > Michael. > I'm sending numbers for the opposite condition. > Of course I'm still afraid that the other code exists for a reason > (tuning/hack/whatever...). I fully agree that the current code is quite hacking and was probably subject of some tuning. I'm leaving the decision about simplification to you? You're much more experienced in the area :) Martin lnt-loop-alignment-v2.pdf.bz2 Description: application/bzip
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
Hi, On Fri, 9 Aug 2019, Richard Biener wrote: > Of course I'm still afraid that the other code exists for a reason > (tuning/hack/whatever...). > > Note that with the patch we're now applying LOOP_ALIGN to L2 here: > if (a) > foo = bar; > L2: > blah; > > because there's a jump-around and a fallthru. Yeah, and I think that would be wrong. That's why the existing code (not sure about after the patch) does this only when L2 is reached by one edge much more often than by the other edges. > So I'm not sure we don't need to apply some condition on fallthru_count > (which is unused after your patch btw). Ciao, Michael.
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
On Thu, Aug 8, 2019 at 2:24 PM Michael Matz wrote: > > Hi, > > On Thu, 8 Aug 2019, Martin Liška wrote: > > > > So docs have > > > > > > @defmac JUMP_ALIGN (@var{label}) > > > The alignment (log base 2) to put in front of @var{label}, which is > > > a common destination of jumps and has no fallthru incoming edge. > > So, per docu: JUMP_ALIGN implies !fallthru ... > > > > align_flags alignment = has_fallthru ? JUMP_ALIGN (label) : > > > LOOP_ALIGN (label); > > ... exactly the opposite way here. Yeah, sorry - my mistake. > > > instead of the two different conditions? Or the JUMP_ALIGN case > > > computing "common destination" instead of "frequently executed" > > > somehow but I think it makes sense that those are actually the same > > > here (frequently executed). It oddly looks at prev_bb and is not > > > guarded with optimize_bb_for_speed_p at all so this all is > > > full of heuristics and changing anything here just based on x86 > > > measurements is surely going to cause issues for targets more > > > sensitive to (mis-)alignment. > > > > I like you patch, it's a rapid simplification of the code which > > we have there. > > Yeah, but it's also contradicting the documentation. And I think the docu > makes sense, because it means that there is no padding inserted on the > fall-thru path (because there is none). So please measure with the > opposite direction. (I still think these conditions shouldn't be > hot-needled, but rather should somewhat make sense in the abstract). Of course I'm still afraid that the other code exists for a reason (tuning/hack/whatever...). Note that with the patch we're now applying LOOP_ALIGN to L2 here: if (a) foo = bar; L2: blah; because there's a jump-around and a fallthru. So I'm not sure we don't need to apply some condition on fallthru_count (which is unused after your patch btw). Richard. > > Ciao, > Michael.
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
Hi, On Thu, 8 Aug 2019, Martin Liška wrote: > > So docs have > > > > @defmac JUMP_ALIGN (@var{label}) > > The alignment (log base 2) to put in front of @var{label}, which is > > a common destination of jumps and has no fallthru incoming edge. So, per docu: JUMP_ALIGN implies !fallthru ... > > align_flags alignment = has_fallthru ? JUMP_ALIGN (label) : > > LOOP_ALIGN (label); ... exactly the opposite way here. > > instead of the two different conditions? Or the JUMP_ALIGN case > > computing "common destination" instead of "frequently executed" > > somehow but I think it makes sense that those are actually the same > > here (frequently executed). It oddly looks at prev_bb and is not > > guarded with optimize_bb_for_speed_p at all so this all is > > full of heuristics and changing anything here just based on x86 > > measurements is surely going to cause issues for targets more > > sensitive to (mis-)alignment. > > I like you patch, it's a rapid simplification of the code which > we have there. Yeah, but it's also contradicting the documentation. And I think the docu makes sense, because it means that there is no padding inserted on the fall-thru path (because there is none). So please measure with the opposite direction. (I still think these conditions shouldn't be hot-needled, but rather should somewhat make sense in the abstract). Ciao, Michael.
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
On 7/11/19 11:42 AM, Richard Biener wrote: > On Wed, Jul 10, 2019 at 5:52 PM Richard Biener > wrote: >> >> On July 10, 2019 2:11:17 PM GMT+02:00, Michael Matz wrote: >>> Hi, >>> >>> On Tue, 9 Jul 2019, Richard Biener wrote: >>> > The basic block index is not a DFS index, so no, that's not a test >>> for > backedge. I think in CFG RTL mode the BB index designates the order of the BBs >>> in the object file? So this is a way to identify backwards jumps? >>> >>> Even if it means a backwards jump (and that's not always the case, the >>> insns are emitted by following the NEXT_INSN links, without a CFG, and >>> that all happens after machine-dependend reorg, and going out of cfg >>> layout might link insn together even from high index BBs to low index >>> BBs >>> (e.g. because of fall-through)), that's still not a backedge in the >>> general case. If a heuristic is enough here it might be okay, though. >>> >>> OTOH, as here a CFG still exists, why not simply rely on a proper DFS >>> marking backedges? >> >> Because proper backedges is not what we want here, see honzas example. >> >> So I'm second-guessing why we have different LOOP_ALIGN and when it makes >> sense to apply. > > So docs have > > @defmac JUMP_ALIGN (@var{label}) > The alignment (log base 2) to put in front of @var{label}, which is > a common destination of jumps and has no fallthru incoming edge. > > ... > > @defmac LOOP_ALIGN (@var{label}) > The alignment (log base 2) to put in front of @var{label} that heads > a frequently executed basic block (usually the header of a loop). > > so I would expect the alignment pass to have > > if ( (branch_count > count_threshold > || (bb->count > bb->prev_bb->count.apply_scale (10, 1) > && (bb->prev_bb->count > <= ENTRY_BLOCK_PTR_FOR_FN (cfun) >->count.apply_scale (1, 2) > { > align_flags alignment = has_fallthru ? JUMP_ALIGN (label) : > LOOP_ALIGN (label); > if (dump_file) > fprintf (dump_file, " jump alignment added.\n"); > max_alignment = align_flags::max (max_alignment, alignment); > } Sorry for the delay. > > instead of the two different conditions? Or the JUMP_ALIGN case > computing "common destination" instead of "frequently executed" > somehow but I think it makes sense that those are actually the same > here (frequently executed). It oddly looks at prev_bb and is not > guarded with optimize_bb_for_speed_p at all so this all is > full of heuristics and changing anything here just based on x86 > measurements is surely going to cause issues for targets more > sensitive to (mis-)alignment. I like you patch, it's a rapid simplification of the code which we have there. I'm sending LNT results before and after the revision. It show there are quite some significant changes. lbm is a known issue where we are lucky with the patch. Thoughts? Martin > > Richard. > >> Richard. >> >>> >>> Ciao, >>> Michael. >> >From b5cad0c2bdc611aedf32a839ead1774fd82fdad4 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Tue, 6 Aug 2019 18:09:25 +0200 Subject: [PATCH] Simplify alignment. --- gcc/final.c | 32 1 file changed, 8 insertions(+), 24 deletions(-) diff --git a/gcc/final.c b/gcc/final.c index fefc4874b24..b33dfc6646e 100644 --- a/gcc/final.c +++ b/gcc/final.c @@ -722,32 +722,16 @@ compute_alignments (void) than the predecessor and the predecessor is likely to not be executed when function is called. */ - if (!has_fallthru - && (branch_count > count_threshold - || (bb->count > bb->prev_bb->count.apply_scale (10, 1) - && (bb->prev_bb->count - <= ENTRY_BLOCK_PTR_FOR_FN (cfun) - ->count.apply_scale (1, 2) + if (branch_count > count_threshold + || (bb->count > bb->prev_bb->count.apply_scale (10, 1) + && (bb->prev_bb->count + <= ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.apply_scale (1, 2 { - align_flags alignment = JUMP_ALIGN (label); + align_flags alignment + = has_fallthru ? JUMP_ALIGN (label) : LOOP_ALIGN (label); if (dump_file) - fprintf (dump_file, " jump alignment added.\n"); - max_alignment = align_flags::max (max_alignment, alignment); - } - /* In case block is frequent and reached mostly by non-fallthru edge, - align it. It is most likely a first block of loop. */ - if (has_fallthru - && !(single_succ_p (bb) - && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)) - && optimize_bb_for_speed_p (bb) - && branch_count + fallthru_count > count_threshold - && (branch_count - > fallthru_count.apply_scale - (PARAM_VALUE (PARAM_ALIGN_LOOP_ITERATIONS), 1))) - { - align_flags alignment = LOOP_ALIGN (label); - if (dump_file) - fprintf (dump_file, " internal loop alignment added.\n"); + fprintf (dump_file, " %s alignment added.\n", + has_fallthru ? "loop" : "internal
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
On Wed, Jul 10, 2019 at 5:52 PM Richard Biener wrote: > > On July 10, 2019 2:11:17 PM GMT+02:00, Michael Matz wrote: > >Hi, > > > >On Tue, 9 Jul 2019, Richard Biener wrote: > > > >> >The basic block index is not a DFS index, so no, that's not a test > >for > >> >backedge. > >> > >> I think in CFG RTL mode the BB index designates the order of the BBs > >in > >> the object file? So this is a way to identify backwards jumps? > > > >Even if it means a backwards jump (and that's not always the case, the > >insns are emitted by following the NEXT_INSN links, without a CFG, and > >that all happens after machine-dependend reorg, and going out of cfg > >layout might link insn together even from high index BBs to low index > >BBs > >(e.g. because of fall-through)), that's still not a backedge in the > >general case. If a heuristic is enough here it might be okay, though. > > > >OTOH, as here a CFG still exists, why not simply rely on a proper DFS > >marking backedges? > > Because proper backedges is not what we want here, see honzas example. > > So I'm second-guessing why we have different LOOP_ALIGN and when it makes > sense to apply. So docs have @defmac JUMP_ALIGN (@var{label}) The alignment (log base 2) to put in front of @var{label}, which is a common destination of jumps and has no fallthru incoming edge. ... @defmac LOOP_ALIGN (@var{label}) The alignment (log base 2) to put in front of @var{label} that heads a frequently executed basic block (usually the header of a loop). so I would expect the alignment pass to have if ( (branch_count > count_threshold || (bb->count > bb->prev_bb->count.apply_scale (10, 1) && (bb->prev_bb->count <= ENTRY_BLOCK_PTR_FOR_FN (cfun) ->count.apply_scale (1, 2) { align_flags alignment = has_fallthru ? JUMP_ALIGN (label) : LOOP_ALIGN (label); if (dump_file) fprintf (dump_file, " jump alignment added.\n"); max_alignment = align_flags::max (max_alignment, alignment); } instead of the two different conditions? Or the JUMP_ALIGN case computing "common destination" instead of "frequently executed" somehow but I think it makes sense that those are actually the same here (frequently executed). It oddly looks at prev_bb and is not guarded with optimize_bb_for_speed_p at all so this all is full of heuristics and changing anything here just based on x86 measurements is surely going to cause issues for targets more sensitive to (mis-)alignment. Richard. > Richard. > > > > >Ciao, > >Michael. >
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
On July 10, 2019 2:11:17 PM GMT+02:00, Michael Matz wrote: >Hi, > >On Tue, 9 Jul 2019, Richard Biener wrote: > >> >The basic block index is not a DFS index, so no, that's not a test >for >> >backedge. >> >> I think in CFG RTL mode the BB index designates the order of the BBs >in >> the object file? So this is a way to identify backwards jumps? > >Even if it means a backwards jump (and that's not always the case, the >insns are emitted by following the NEXT_INSN links, without a CFG, and >that all happens after machine-dependend reorg, and going out of cfg >layout might link insn together even from high index BBs to low index >BBs >(e.g. because of fall-through)), that's still not a backedge in the >general case. If a heuristic is enough here it might be okay, though. > >OTOH, as here a CFG still exists, why not simply rely on a proper DFS >marking backedges? Because proper backedges is not what we want here, see honzas example. So I'm second-guessing why we have different LOOP_ALIGN and when it makes sense to apply. Richard. > >Ciao, >Michael.
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
Hi, On Tue, 9 Jul 2019, Richard Biener wrote: > >The basic block index is not a DFS index, so no, that's not a test for > >backedge. > > I think in CFG RTL mode the BB index designates the order of the BBs in > the object file? So this is a way to identify backwards jumps? Even if it means a backwards jump (and that's not always the case, the insns are emitted by following the NEXT_INSN links, without a CFG, and that all happens after machine-dependend reorg, and going out of cfg layout might link insn together even from high index BBs to low index BBs (e.g. because of fall-through)), that's still not a backedge in the general case. If a heuristic is enough here it might be okay, though. OTOH, as here a CFG still exists, why not simply rely on a proper DFS marking backedges? Ciao, Michael.
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
On July 9, 2019 3:10:19 PM GMT+02:00, Michael Matz wrote: >Hi, > >On Tue, 9 Jul 2019, Richard Biener wrote: > >> > So a "backedge" in this sense would be e->dest->index < >e->src->index. >> > No? >> >> To me the following would make sense. > >The basic block index is not a DFS index, so no, that's not a test for >backedge. I think in CFG RTL mode the BB index designates the order of the BBs in the object file? So this is a way to identify backwards jumps? Richard. > >Ciao, >Michael.
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
Hi, On Tue, 9 Jul 2019, Richard Biener wrote: > > So a "backedge" in this sense would be e->dest->index < e->src->index. > > No? > > To me the following would make sense. The basic block index is not a DFS index, so no, that's not a test for backedge. Ciao, Michael.
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
On Tue, Jul 9, 2019 at 12:23 PM Richard Biener wrote: > > On Tue, Jul 9, 2019 at 12:22 PM Richard Biener > wrote: > > > > On Tue, Jul 9, 2019 at 11:56 AM Jan Hubicka wrote: > > > > > > > Hi. > > > > > > > > I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the > > > > basic blocks for which it makes the biggest sense. I quite some binary > > > > size reductions on SPEC2006 and SPEC2017. Speed numbers are also > > > > slightly > > > > positive. > > > > > > > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > > > > > > > Ready to be installed? > > > The original idea of distinction between jump alignment and loop > > > alignment was that they have two basic meanings: > > > 1) jump alignment is there to avoid jumping just to the end of decode > > > window (if the window is aligned) so CPU will get stuck after reaching > > > the jump and also to possibly reduce code cache polution by populating > > > by code that is not executed > > > 2) loop alignment aims to fit loop in as few cache windows as possible > > > > > > Now if you have loop laid in a way that header of loop is not first > > > basic block, 2) IMO still apply. I.e. > > > > > > jump loop > > > :loopback > > > loop body > > > :loop > > > if cond jump to loopback > > > > > > So dropping loop alignment for those does not seem to make much sense > > > from high level. We may want to have differnt alignment for loops > > > starting by header and loops starting in the middle, but I still liked > > > more your patch which did bundles for loops. > > > > > > modern x86 chips are not very good testing targets on it. I guess > > > generic changes to alignment needs to be tested on other chips too. > > > > > > Honza > > > > Thanks, > > > > Martin > > > > > > > > gcc/ChangeLog: > > > > > > > > 2019-07-09 Martin Liska > > > > > > > > * final.c (compute_alignments): Apply the LOOP_ALIGN only > > > > to basic blocks that all loop headers. > > > > --- > > > > gcc/final.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > > diff --git a/gcc/final.c b/gcc/final.c > > > > index fefc4874b24..ce2678da988 100644 > > > > --- a/gcc/final.c > > > > +++ b/gcc/final.c > > > > @@ -739,6 +739,7 @@ compute_alignments (void) > > > >if (has_fallthru > > > > && !(single_succ_p (bb) > > > > && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)) > > > > + && bb->loop_father->header == bb > > > > I agree that the above is the wrong condition - but I'm not sure we > > only end up using LOOP_ALIGN for blocks reached by a DFS_BACK > > edge. Note that DFS_BACK would have to be applied considering > > the current CFG layout, simply doing mark_dfs_back_edges doesn't > > work (we're in CFG layout mode here, no?). > > So a "backedge" in this sense would be e->dest->index < e->src->index. > No? To me the following would make sense. Index: gcc/final.c === --- gcc/final.c (revision 273294) +++ gcc/final.c (working copy) @@ -669,6 +669,7 @@ compute_alignments (void) { rtx_insn *label = BB_HEAD (bb); bool has_fallthru = 0; + bool has_backedge = 0; edge e; edge_iterator ei; @@ -693,6 +694,8 @@ compute_alignments (void) has_fallthru = 1, fallthru_count += e->count (); else branch_count += e->count (); + if (e->src->index > bb->index) + has_backedge = 1; } if (dump_file) { @@ -736,7 +739,7 @@ compute_alignments (void) } /* In case block is frequent and reached mostly by non-fallthru edge, align it. It is most likely a first block of loop. */ - if (has_fallthru + if (has_backedge && !(single_succ_p (bb) && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)) && optimize_bb_for_speed_p (bb)
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
On Tue, Jul 9, 2019 at 12:22 PM Richard Biener wrote: > > On Tue, Jul 9, 2019 at 11:56 AM Jan Hubicka wrote: > > > > > Hi. > > > > > > I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the > > > basic blocks for which it makes the biggest sense. I quite some binary > > > size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly > > > positive. > > > > > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > > > > > Ready to be installed? > > The original idea of distinction between jump alignment and loop > > alignment was that they have two basic meanings: > > 1) jump alignment is there to avoid jumping just to the end of decode > > window (if the window is aligned) so CPU will get stuck after reaching > > the jump and also to possibly reduce code cache polution by populating > > by code that is not executed > > 2) loop alignment aims to fit loop in as few cache windows as possible > > > > Now if you have loop laid in a way that header of loop is not first > > basic block, 2) IMO still apply. I.e. > > > > jump loop > > :loopback > > loop body > > :loop > > if cond jump to loopback > > > > So dropping loop alignment for those does not seem to make much sense > > from high level. We may want to have differnt alignment for loops > > starting by header and loops starting in the middle, but I still liked > > more your patch which did bundles for loops. > > > > modern x86 chips are not very good testing targets on it. I guess > > generic changes to alignment needs to be tested on other chips too. > > > > Honza > > > Thanks, > > > Martin > > > > > > gcc/ChangeLog: > > > > > > 2019-07-09 Martin Liska > > > > > > * final.c (compute_alignments): Apply the LOOP_ALIGN only > > > to basic blocks that all loop headers. > > > --- > > > gcc/final.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > diff --git a/gcc/final.c b/gcc/final.c > > > index fefc4874b24..ce2678da988 100644 > > > --- a/gcc/final.c > > > +++ b/gcc/final.c > > > @@ -739,6 +739,7 @@ compute_alignments (void) > > >if (has_fallthru > > > && !(single_succ_p (bb) > > > && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)) > > > + && bb->loop_father->header == bb > > I agree that the above is the wrong condition - but I'm not sure we > only end up using LOOP_ALIGN for blocks reached by a DFS_BACK > edge. Note that DFS_BACK would have to be applied considering > the current CFG layout, simply doing mark_dfs_back_edges doesn't > work (we're in CFG layout mode here, no?). So a "backedge" in this sense would be e->dest->index < e->src->index. No? > Eventually the code > counting brances effectively already does this though. > > The odd thing is that we apply LOOP_ALIGN only to blocks that > have a fallthru incoming edge. I don't see Honzas example > above having one. > > > > && optimize_bb_for_speed_p (bb) > > > && branch_count + fallthru_count > count_threshold > > > && (branch_count > > > > > > > > >
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
On Tue, Jul 9, 2019 at 11:56 AM Jan Hubicka wrote: > > > Hi. > > > > I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the > > basic blocks for which it makes the biggest sense. I quite some binary > > size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly > > positive. > > > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > > > Ready to be installed? > The original idea of distinction between jump alignment and loop > alignment was that they have two basic meanings: > 1) jump alignment is there to avoid jumping just to the end of decode > window (if the window is aligned) so CPU will get stuck after reaching > the jump and also to possibly reduce code cache polution by populating > by code that is not executed > 2) loop alignment aims to fit loop in as few cache windows as possible > > Now if you have loop laid in a way that header of loop is not first > basic block, 2) IMO still apply. I.e. > > jump loop > :loopback > loop body > :loop > if cond jump to loopback > > So dropping loop alignment for those does not seem to make much sense > from high level. We may want to have differnt alignment for loops > starting by header and loops starting in the middle, but I still liked > more your patch which did bundles for loops. > > modern x86 chips are not very good testing targets on it. I guess > generic changes to alignment needs to be tested on other chips too. > > Honza > > Thanks, > > Martin > > > > gcc/ChangeLog: > > > > 2019-07-09 Martin Liska > > > > * final.c (compute_alignments): Apply the LOOP_ALIGN only > > to basic blocks that all loop headers. > > --- > > gcc/final.c | 1 + > > 1 file changed, 1 insertion(+) > > > > > > > diff --git a/gcc/final.c b/gcc/final.c > > index fefc4874b24..ce2678da988 100644 > > --- a/gcc/final.c > > +++ b/gcc/final.c > > @@ -739,6 +739,7 @@ compute_alignments (void) > >if (has_fallthru > > && !(single_succ_p (bb) > > && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)) > > + && bb->loop_father->header == bb I agree that the above is the wrong condition - but I'm not sure we only end up using LOOP_ALIGN for blocks reached by a DFS_BACK edge. Note that DFS_BACK would have to be applied considering the current CFG layout, simply doing mark_dfs_back_edges doesn't work (we're in CFG layout mode here, no?). Eventually the code counting brances effectively already does this though. The odd thing is that we apply LOOP_ALIGN only to blocks that have a fallthru incoming edge. I don't see Honzas example above having one. > > && optimize_bb_for_speed_p (bb) > > && branch_count + fallthru_count > count_threshold > > && (branch_count > > > > >
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
On 7/9/19 11:56 AM, Jan Hubicka wrote: >> Hi. >> >> I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the >> basic blocks for which it makes the biggest sense. I quite some binary >> size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly >> positive. >> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. >> >> Ready to be installed? > The original idea of distinction between jump alignment and loop > alignment was that they have two basic meanings: > 1) jump alignment is there to avoid jumping just to the end of decode > window (if the window is aligned) so CPU will get stuck after reaching > the jump and also to possibly reduce code cache polution by populating > by code that is not executed > 2) loop alignment aims to fit loop in as few cache windows as possible > > Now if you have loop laid in a way that header of loop is not first > basic block, 2) IMO still apply. I.e. > > jump loop > :loopback > loop body > :loop > if cond jump to loopback > > So dropping loop alignment for those does not seem to make much sense > from high level. We may want to have differnt alignment for loops > starting by header and loops starting in the middle, That's quite complicated condition, I would not introduce a new alignment. > but I still liked > more your patch which did bundles for loops. The patch caused regression for quite some benchmarks and has it's own problems (need of a recent GAS, not doing a bundle for bundles that can't fit in a single bundle window). For that reasons, I decided to not work on it any longer. Martin > > modern x86 chips are not very good testing targets on it. I guess > generic changes to alignment needs to be tested on other chips too. > > Honza >> Thanks, >> Martin >> >> gcc/ChangeLog: >> >> 2019-07-09 Martin Liska >> >> * final.c (compute_alignments): Apply the LOOP_ALIGN only >> to basic blocks that all loop headers. >> --- >> gcc/final.c | 1 + >> 1 file changed, 1 insertion(+) >> >> > >> diff --git a/gcc/final.c b/gcc/final.c >> index fefc4874b24..ce2678da988 100644 >> --- a/gcc/final.c >> +++ b/gcc/final.c >> @@ -739,6 +739,7 @@ compute_alignments (void) >>if (has_fallthru >>&& !(single_succ_p (bb) >> && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)) >> + && bb->loop_father->header == bb >>&& optimize_bb_for_speed_p (bb) >>&& branch_count + fallthru_count > count_threshold >>&& (branch_count >> > > >
Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
> Hi. > > I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the > basic blocks for which it makes the biggest sense. I quite some binary > size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly > positive. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? The original idea of distinction between jump alignment and loop alignment was that they have two basic meanings: 1) jump alignment is there to avoid jumping just to the end of decode window (if the window is aligned) so CPU will get stuck after reaching the jump and also to possibly reduce code cache polution by populating by code that is not executed 2) loop alignment aims to fit loop in as few cache windows as possible Now if you have loop laid in a way that header of loop is not first basic block, 2) IMO still apply. I.e. jump loop :loopback loop body :loop if cond jump to loopback So dropping loop alignment for those does not seem to make much sense from high level. We may want to have differnt alignment for loops starting by header and loops starting in the middle, but I still liked more your patch which did bundles for loops. modern x86 chips are not very good testing targets on it. I guess generic changes to alignment needs to be tested on other chips too. Honza > Thanks, > Martin > > gcc/ChangeLog: > > 2019-07-09 Martin Liska > > * final.c (compute_alignments): Apply the LOOP_ALIGN only > to basic blocks that all loop headers. > --- > gcc/final.c | 1 + > 1 file changed, 1 insertion(+) > > > diff --git a/gcc/final.c b/gcc/final.c > index fefc4874b24..ce2678da988 100644 > --- a/gcc/final.c > +++ b/gcc/final.c > @@ -739,6 +739,7 @@ compute_alignments (void) >if (has_fallthru > && !(single_succ_p (bb) > && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)) > + && bb->loop_father->header == bb > && optimize_bb_for_speed_p (bb) > && branch_count + fallthru_count > count_threshold > && (branch_count >