Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-10-29 Thread Martin Liška
On 8/9/19 2:51 PM, Martin Liška wrote:
> On 8/9/19 2:13 PM, Michael Matz wrote:
>> Hi,
>>
>> On Fri, 9 Aug 2019, Richard Biener wrote:
>>
>>> Of course I'm still afraid that the other code exists for a reason
>>> (tuning/hack/whatever...).
>>>
>>> Note that with the patch we're now applying LOOP_ALIGN to L2 here:
>>>   if (a)
>>> foo = bar;
>>> L2:
>>>   blah;
>>>
>>> because there's a jump-around and a fallthru.
>>
>> Yeah, and I think that would be wrong.  That's why the existing code (not 
>> sure about after the patch) does this only when L2 is reached by one edge 
>> much more often than by the other edges.
>>
>>> So I'm not sure we don't need to apply some condition on fallthru_count 
>>> (which is unused after your patch btw).
>>
>>
>> Ciao,
>> Michael.
>>
> 
> I'm sending numbers for the opposite condition.
> 
>> Of course I'm still afraid that the other code exists for a reason
>> (tuning/hack/whatever...).
> 
> I fully agree that the current code is quite hacking and was probably subject
> of some tuning.
> 
> I'm leaving the decision about simplification to you?
> You're much more experienced in the area :)
> 
> Martin
> 

@Honza: PING

Martin


pEpkey.asc
Description: application/pgp-keys


Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-08-09 Thread Martin Liška
On 8/9/19 2:13 PM, Michael Matz wrote:
> Hi,
> 
> On Fri, 9 Aug 2019, Richard Biener wrote:
> 
>> Of course I'm still afraid that the other code exists for a reason
>> (tuning/hack/whatever...).
>>
>> Note that with the patch we're now applying LOOP_ALIGN to L2 here:
>>   if (a)
>> foo = bar;
>> L2:
>>   blah;
>>
>> because there's a jump-around and a fallthru.
> 
> Yeah, and I think that would be wrong.  That's why the existing code (not 
> sure about after the patch) does this only when L2 is reached by one edge 
> much more often than by the other edges.
> 
>> So I'm not sure we don't need to apply some condition on fallthru_count 
>> (which is unused after your patch btw).
> 
> 
> Ciao,
> Michael.
> 

I'm sending numbers for the opposite condition.

> Of course I'm still afraid that the other code exists for a reason
> (tuning/hack/whatever...).

I fully agree that the current code is quite hacking and was probably subject
of some tuning.

I'm leaving the decision about simplification to you?
You're much more experienced in the area :)

Martin


lnt-loop-alignment-v2.pdf.bz2
Description: application/bzip


Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-08-09 Thread Michael Matz
Hi,

On Fri, 9 Aug 2019, Richard Biener wrote:

> Of course I'm still afraid that the other code exists for a reason
> (tuning/hack/whatever...).
> 
> Note that with the patch we're now applying LOOP_ALIGN to L2 here:
>   if (a)
> foo = bar;
> L2:
>   blah;
> 
> because there's a jump-around and a fallthru.

Yeah, and I think that would be wrong.  That's why the existing code (not 
sure about after the patch) does this only when L2 is reached by one edge 
much more often than by the other edges.

> So I'm not sure we don't need to apply some condition on fallthru_count 
> (which is unused after your patch btw).


Ciao,
Michael.


Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-08-09 Thread Richard Biener
On Thu, Aug 8, 2019 at 2:24 PM Michael Matz  wrote:
>
> Hi,
>
> On Thu, 8 Aug 2019, Martin Liška wrote:
>
> > > So docs have
> > >
> > > @defmac JUMP_ALIGN (@var{label})
> > > The alignment (log base 2) to put in front of @var{label}, which is
> > > a common destination of jumps and has no fallthru incoming edge.
>
> So, per docu: JUMP_ALIGN implies !fallthru ...
>
> > >   align_flags alignment = has_fallthru ? JUMP_ALIGN (label) :
> > > LOOP_ALIGN (label);
>
> ... exactly the opposite way here.

Yeah, sorry - my mistake.

> > > instead of the two different conditions?  Or the JUMP_ALIGN case
> > > computing "common destination" instead of "frequently executed"
> > > somehow but I think it makes sense that those are actually the same
> > > here (frequently executed).  It oddly looks at prev_bb and is not
> > > guarded with optimize_bb_for_speed_p at all so this all is
> > > full of heuristics and changing anything here just based on x86
> > > measurements is surely going to cause issues for targets more
> > > sensitive to (mis-)alignment.
> >
> > I like you patch, it's a rapid simplification of the code which
> > we have there.
>
> Yeah, but it's also contradicting the documentation.  And I think the docu
> makes sense, because it means that there is no padding inserted on the
> fall-thru path (because there is none).  So please measure with the
> opposite direction.  (I still think these conditions shouldn't be
> hot-needled, but rather should somewhat make sense in the abstract).

Of course I'm still afraid that the other code exists for a reason
(tuning/hack/whatever...).

Note that with the patch we're now applying LOOP_ALIGN to L2 here:
  if (a)
foo = bar;
L2:
  blah;

because there's a jump-around and a fallthru.  So I'm not sure
we don't need to apply some condition on fallthru_count
(which is unused after your patch btw).

Richard.


>
> Ciao,
> Michael.


Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-08-08 Thread Michael Matz
Hi,

On Thu, 8 Aug 2019, Martin Liška wrote:

> > So docs have
> > 
> > @defmac JUMP_ALIGN (@var{label})
> > The alignment (log base 2) to put in front of @var{label}, which is
> > a common destination of jumps and has no fallthru incoming edge.

So, per docu: JUMP_ALIGN implies !fallthru ...

> >   align_flags alignment = has_fallthru ? JUMP_ALIGN (label) :
> > LOOP_ALIGN (label);

... exactly the opposite way here.

> > instead of the two different conditions?  Or the JUMP_ALIGN case
> > computing "common destination" instead of "frequently executed"
> > somehow but I think it makes sense that those are actually the same
> > here (frequently executed).  It oddly looks at prev_bb and is not
> > guarded with optimize_bb_for_speed_p at all so this all is
> > full of heuristics and changing anything here just based on x86
> > measurements is surely going to cause issues for targets more
> > sensitive to (mis-)alignment.
> 
> I like you patch, it's a rapid simplification of the code which
> we have there.

Yeah, but it's also contradicting the documentation.  And I think the docu 
makes sense, because it means that there is no padding inserted on the 
fall-thru path (because there is none).  So please measure with the 
opposite direction.  (I still think these conditions shouldn't be 
hot-needled, but rather should somewhat make sense in the abstract).


Ciao,
Michael.

Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-08-08 Thread Martin Liška
On 7/11/19 11:42 AM, Richard Biener wrote:
> On Wed, Jul 10, 2019 at 5:52 PM Richard Biener
>  wrote:
>>
>> On July 10, 2019 2:11:17 PM GMT+02:00, Michael Matz  wrote:
>>> Hi,
>>>
>>> On Tue, 9 Jul 2019, Richard Biener wrote:
>>>
> The basic block index is not a DFS index, so no, that's not a test
>>> for
> backedge.

 I think in CFG RTL mode the BB index designates the order of the BBs
>>> in
 the object file? So this is a way to identify backwards jumps?
>>>
>>> Even if it means a backwards jump (and that's not always the case, the
>>> insns are emitted by following the NEXT_INSN links, without a CFG, and
>>> that all happens after machine-dependend reorg, and going out of cfg
>>> layout might link insn together even from high index BBs to low index
>>> BBs
>>> (e.g. because of fall-through)), that's still not a backedge in the
>>> general case.  If a heuristic is enough here it might be okay, though.
>>>
>>> OTOH, as here a CFG still exists, why not simply rely on a proper DFS
>>> marking backedges?
>>
>> Because proper backedges is not what we want here, see honzas example.
>>
>> So I'm second-guessing why we have different LOOP_ALIGN and when it makes 
>> sense to apply.
> 
> So docs have
> 
> @defmac JUMP_ALIGN (@var{label})
> The alignment (log base 2) to put in front of @var{label}, which is
> a common destination of jumps and has no fallthru incoming edge.
> 
> ...
> 
> @defmac LOOP_ALIGN (@var{label})
> The alignment (log base 2) to put in front of @var{label} that heads
> a frequently executed basic block (usually the header of a loop).
> 
> so I would expect the alignment pass to have
> 
>   if ( (branch_count > count_threshold
>   || (bb->count > bb->prev_bb->count.apply_scale (10, 1)
>   && (bb->prev_bb->count
>   <= ENTRY_BLOCK_PTR_FOR_FN (cfun)
>->count.apply_scale (1, 2)
> {
>   align_flags alignment = has_fallthru ? JUMP_ALIGN (label) :
> LOOP_ALIGN (label);
>   if (dump_file)
> fprintf (dump_file, "  jump alignment added.\n");
>   max_alignment = align_flags::max (max_alignment, alignment);
> }

Sorry for the delay.

> 
> instead of the two different conditions?  Or the JUMP_ALIGN case
> computing "common destination" instead of "frequently executed"
> somehow but I think it makes sense that those are actually the same
> here (frequently executed).  It oddly looks at prev_bb and is not
> guarded with optimize_bb_for_speed_p at all so this all is
> full of heuristics and changing anything here just based on x86
> measurements is surely going to cause issues for targets more
> sensitive to (mis-)alignment.

I like you patch, it's a rapid simplification of the code which
we have there.

I'm sending LNT results before and after the revision. It show
there are quite some significant changes. lbm is a known issue
where we are lucky with the patch.

Thoughts?
Martin

> 
> Richard.
> 
>> Richard.
>>
>>>
>>> Ciao,
>>> Michael.
>>

>From b5cad0c2bdc611aedf32a839ead1774fd82fdad4 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 6 Aug 2019 18:09:25 +0200
Subject: [PATCH] Simplify alignment.

---
 gcc/final.c | 32 
 1 file changed, 8 insertions(+), 24 deletions(-)

diff --git a/gcc/final.c b/gcc/final.c
index fefc4874b24..b33dfc6646e 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -722,32 +722,16 @@ compute_alignments (void)
 	 than the predecessor and the predecessor is likely to not be executed
 	 when function is called.  */
 
-  if (!has_fallthru
-	  && (branch_count > count_threshold
-	  || (bb->count > bb->prev_bb->count.apply_scale (10, 1)
-		  && (bb->prev_bb->count
-		  <= ENTRY_BLOCK_PTR_FOR_FN (cfun)
-			   ->count.apply_scale (1, 2)
+  if (branch_count > count_threshold
+	  || (bb->count > bb->prev_bb->count.apply_scale (10, 1)
+	  && (bb->prev_bb->count
+		  <= ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.apply_scale (1, 2
 	{
-	  align_flags alignment = JUMP_ALIGN (label);
+	  align_flags alignment
+	= has_fallthru ? JUMP_ALIGN (label) : LOOP_ALIGN (label);
 	  if (dump_file)
-	fprintf (dump_file, "  jump alignment added.\n");
-	  max_alignment = align_flags::max (max_alignment, alignment);
-	}
-  /* In case block is frequent and reached mostly by non-fallthru edge,
-	 align it.  It is most likely a first block of loop.  */
-  if (has_fallthru
-	  && !(single_succ_p (bb)
-	   && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
-	  && optimize_bb_for_speed_p (bb)
-	  && branch_count + fallthru_count > count_threshold
-	  && (branch_count
-	  > fallthru_count.apply_scale
-		(PARAM_VALUE (PARAM_ALIGN_LOOP_ITERATIONS), 1)))
-	{
-	  align_flags alignment = LOOP_ALIGN (label);
-	  if (dump_file)
-	fprintf (dump_file, "  internal loop alignment added.\n");
+	fprintf (dump_file, "  %s alignment added.\n",
+		 has_fallthru ? "loop" : "internal 

Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-07-11 Thread Richard Biener
On Wed, Jul 10, 2019 at 5:52 PM Richard Biener
 wrote:
>
> On July 10, 2019 2:11:17 PM GMT+02:00, Michael Matz  wrote:
> >Hi,
> >
> >On Tue, 9 Jul 2019, Richard Biener wrote:
> >
> >> >The basic block index is not a DFS index, so no, that's not a test
> >for
> >> >backedge.
> >>
> >> I think in CFG RTL mode the BB index designates the order of the BBs
> >in
> >> the object file? So this is a way to identify backwards jumps?
> >
> >Even if it means a backwards jump (and that's not always the case, the
> >insns are emitted by following the NEXT_INSN links, without a CFG, and
> >that all happens after machine-dependend reorg, and going out of cfg
> >layout might link insn together even from high index BBs to low index
> >BBs
> >(e.g. because of fall-through)), that's still not a backedge in the
> >general case.  If a heuristic is enough here it might be okay, though.
> >
> >OTOH, as here a CFG still exists, why not simply rely on a proper DFS
> >marking backedges?
>
> Because proper backedges is not what we want here, see honzas example.
>
> So I'm second-guessing why we have different LOOP_ALIGN and when it makes 
> sense to apply.

So docs have

@defmac JUMP_ALIGN (@var{label})
The alignment (log base 2) to put in front of @var{label}, which is
a common destination of jumps and has no fallthru incoming edge.

...

@defmac LOOP_ALIGN (@var{label})
The alignment (log base 2) to put in front of @var{label} that heads
a frequently executed basic block (usually the header of a loop).

so I would expect the alignment pass to have

  if ( (branch_count > count_threshold
  || (bb->count > bb->prev_bb->count.apply_scale (10, 1)
  && (bb->prev_bb->count
  <= ENTRY_BLOCK_PTR_FOR_FN (cfun)
   ->count.apply_scale (1, 2)
{
  align_flags alignment = has_fallthru ? JUMP_ALIGN (label) :
LOOP_ALIGN (label);
  if (dump_file)
fprintf (dump_file, "  jump alignment added.\n");
  max_alignment = align_flags::max (max_alignment, alignment);
}

instead of the two different conditions?  Or the JUMP_ALIGN case
computing "common destination" instead of "frequently executed"
somehow but I think it makes sense that those are actually the same
here (frequently executed).  It oddly looks at prev_bb and is not
guarded with optimize_bb_for_speed_p at all so this all is
full of heuristics and changing anything here just based on x86
measurements is surely going to cause issues for targets more
sensitive to (mis-)alignment.

Richard.

> Richard.
>
> >
> >Ciao,
> >Michael.
>


Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-07-10 Thread Richard Biener
On July 10, 2019 2:11:17 PM GMT+02:00, Michael Matz  wrote:
>Hi,
>
>On Tue, 9 Jul 2019, Richard Biener wrote:
>
>> >The basic block index is not a DFS index, so no, that's not a test
>for 
>> >backedge.
>> 
>> I think in CFG RTL mode the BB index designates the order of the BBs
>in 
>> the object file? So this is a way to identify backwards jumps?
>
>Even if it means a backwards jump (and that's not always the case, the 
>insns are emitted by following the NEXT_INSN links, without a CFG, and 
>that all happens after machine-dependend reorg, and going out of cfg 
>layout might link insn together even from high index BBs to low index
>BBs 
>(e.g. because of fall-through)), that's still not a backedge in the 
>general case.  If a heuristic is enough here it might be okay, though.
>
>OTOH, as here a CFG still exists, why not simply rely on a proper DFS 
>marking backedges?

Because proper backedges is not what we want here, see honzas example. 

So I'm second-guessing why we have different LOOP_ALIGN and when it makes sense 
to apply. 

Richard. 

>
>Ciao,
>Michael.



Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-07-10 Thread Michael Matz
Hi,

On Tue, 9 Jul 2019, Richard Biener wrote:

> >The basic block index is not a DFS index, so no, that's not a test for 
> >backedge.
> 
> I think in CFG RTL mode the BB index designates the order of the BBs in 
> the object file? So this is a way to identify backwards jumps?

Even if it means a backwards jump (and that's not always the case, the 
insns are emitted by following the NEXT_INSN links, without a CFG, and 
that all happens after machine-dependend reorg, and going out of cfg 
layout might link insn together even from high index BBs to low index BBs 
(e.g. because of fall-through)), that's still not a backedge in the 
general case.  If a heuristic is enough here it might be okay, though.

OTOH, as here a CFG still exists, why not simply rely on a proper DFS 
marking backedges?


Ciao,
Michael.


Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-07-09 Thread Richard Biener
On July 9, 2019 3:10:19 PM GMT+02:00, Michael Matz  wrote:
>Hi,
>
>On Tue, 9 Jul 2019, Richard Biener wrote:
>
>> > So a "backedge" in this sense would be e->dest->index <
>e->src->index.
>> > No?
>> 
>> To me the following would make sense.
>
>The basic block index is not a DFS index, so no, that's not a test for 
>backedge.

I think in CFG RTL mode the BB index designates the order of the BBs in the 
object file? So this is a way to identify backwards jumps? 

Richard. 

>
>Ciao,
>Michael.



Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-07-09 Thread Michael Matz
Hi,

On Tue, 9 Jul 2019, Richard Biener wrote:

> > So a "backedge" in this sense would be e->dest->index < e->src->index.
> > No?
> 
> To me the following would make sense.

The basic block index is not a DFS index, so no, that's not a test for 
backedge.


Ciao,
Michael.


Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-07-09 Thread Richard Biener
On Tue, Jul 9, 2019 at 12:23 PM Richard Biener
 wrote:
>
> On Tue, Jul 9, 2019 at 12:22 PM Richard Biener
>  wrote:
> >
> > On Tue, Jul 9, 2019 at 11:56 AM Jan Hubicka  wrote:
> > >
> > > > Hi.
> > > >
> > > > I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the
> > > > basic blocks for which it makes the biggest sense. I quite some binary
> > > > size reductions on SPEC2006 and SPEC2017. Speed numbers are also 
> > > > slightly
> > > > positive.
> > > >
> > > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> > > >
> > > > Ready to be installed?
> > > The original idea of distinction between jump alignment and loop
> > > alignment was that they have two basic meanings:
> > >  1) jump alignment is there to avoid jumping just to the end of decode
> > >  window (if the window is aligned) so CPU will get stuck after reaching
> > >  the jump and also to possibly reduce code cache polution by populating
> > >  by code that is not executed
> > >  2) loop alignment aims to fit loop in as few cache windows as possible
> > >
> > > Now if you have loop laid in a way that header of loop is not first
> > > basic block, 2) IMO still apply.  I.e.
> > >
> > > jump loop
> > > :loopback
> > > loop body
> > > :loop
> > > if cond jump to loopback
> > >
> > > So dropping loop alignment for those does not seem to make much sense
> > > from high level.  We may want to have differnt alignment for loops
> > > starting by header and loops starting in the middle, but I still liked
> > > more your patch which did bundles for loops.
> > >
> > > modern x86 chips are not very good testing targets on it.  I guess
> > > generic changes to alignment needs to be tested on other chips too.
> > >
> > > Honza
> > > > Thanks,
> > > > Martin
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 2019-07-09  Martin Liska  
> > > >
> > > >   * final.c (compute_alignments): Apply the LOOP_ALIGN only
> > > >   to basic blocks that all loop headers.
> > > > ---
> > > >  gcc/final.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > >
> > >
> > > > diff --git a/gcc/final.c b/gcc/final.c
> > > > index fefc4874b24..ce2678da988 100644
> > > > --- a/gcc/final.c
> > > > +++ b/gcc/final.c
> > > > @@ -739,6 +739,7 @@ compute_alignments (void)
> > > >if (has_fallthru
> > > > && !(single_succ_p (bb)
> > > >  && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
> > > > +   && bb->loop_father->header == bb
> >
> > I agree that the above is the wrong condition - but I'm not sure we
> > only end up using LOOP_ALIGN for blocks reached by a DFS_BACK
> > edge.  Note that DFS_BACK would have to be applied considering
> > the current CFG layout, simply doing mark_dfs_back_edges doesn't
> > work (we're in CFG layout mode here, no?).
>
> So a "backedge" in this sense would be e->dest->index < e->src->index.
> No?

To me the following would make sense.

Index: gcc/final.c
===
--- gcc/final.c (revision 273294)
+++ gcc/final.c (working copy)
@@ -669,6 +669,7 @@ compute_alignments (void)
 {
   rtx_insn *label = BB_HEAD (bb);
   bool has_fallthru = 0;
+  bool has_backedge = 0;
   edge e;
   edge_iterator ei;

@@ -693,6 +694,8 @@ compute_alignments (void)
has_fallthru = 1, fallthru_count += e->count ();
  else
branch_count += e->count ();
+ if (e->src->index > bb->index)
+   has_backedge = 1;
}
   if (dump_file)
{
@@ -736,7 +739,7 @@ compute_alignments (void)
}
   /* In case block is frequent and reached mostly by non-fallthru edge,
 align it.  It is most likely a first block of loop.  */
-  if (has_fallthru
+  if (has_backedge
  && !(single_succ_p (bb)
   && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
  && optimize_bb_for_speed_p (bb)


Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-07-09 Thread Richard Biener
On Tue, Jul 9, 2019 at 12:22 PM Richard Biener
 wrote:
>
> On Tue, Jul 9, 2019 at 11:56 AM Jan Hubicka  wrote:
> >
> > > Hi.
> > >
> > > I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the
> > > basic blocks for which it makes the biggest sense. I quite some binary
> > > size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly
> > > positive.
> > >
> > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> > >
> > > Ready to be installed?
> > The original idea of distinction between jump alignment and loop
> > alignment was that they have two basic meanings:
> >  1) jump alignment is there to avoid jumping just to the end of decode
> >  window (if the window is aligned) so CPU will get stuck after reaching
> >  the jump and also to possibly reduce code cache polution by populating
> >  by code that is not executed
> >  2) loop alignment aims to fit loop in as few cache windows as possible
> >
> > Now if you have loop laid in a way that header of loop is not first
> > basic block, 2) IMO still apply.  I.e.
> >
> > jump loop
> > :loopback
> > loop body
> > :loop
> > if cond jump to loopback
> >
> > So dropping loop alignment for those does not seem to make much sense
> > from high level.  We may want to have differnt alignment for loops
> > starting by header and loops starting in the middle, but I still liked
> > more your patch which did bundles for loops.
> >
> > modern x86 chips are not very good testing targets on it.  I guess
> > generic changes to alignment needs to be tested on other chips too.
> >
> > Honza
> > > Thanks,
> > > Martin
> > >
> > > gcc/ChangeLog:
> > >
> > > 2019-07-09  Martin Liska  
> > >
> > >   * final.c (compute_alignments): Apply the LOOP_ALIGN only
> > >   to basic blocks that all loop headers.
> > > ---
> > >  gcc/final.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > >
> >
> > > diff --git a/gcc/final.c b/gcc/final.c
> > > index fefc4874b24..ce2678da988 100644
> > > --- a/gcc/final.c
> > > +++ b/gcc/final.c
> > > @@ -739,6 +739,7 @@ compute_alignments (void)
> > >if (has_fallthru
> > > && !(single_succ_p (bb)
> > >  && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
> > > +   && bb->loop_father->header == bb
>
> I agree that the above is the wrong condition - but I'm not sure we
> only end up using LOOP_ALIGN for blocks reached by a DFS_BACK
> edge.  Note that DFS_BACK would have to be applied considering
> the current CFG layout, simply doing mark_dfs_back_edges doesn't
> work (we're in CFG layout mode here, no?).

So a "backedge" in this sense would be e->dest->index < e->src->index.
No?

> Eventually the code
> counting brances effectively already does this though.
>
> The odd thing is that we apply LOOP_ALIGN only to blocks that
> have a fallthru incoming edge.  I don't see Honzas example
> above having one.
>
> > > && optimize_bb_for_speed_p (bb)
> > > && branch_count + fallthru_count > count_threshold
> > > && (branch_count
> > >
> >
> >
> >


Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-07-09 Thread Richard Biener
On Tue, Jul 9, 2019 at 11:56 AM Jan Hubicka  wrote:
>
> > Hi.
> >
> > I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the
> > basic blocks for which it makes the biggest sense. I quite some binary
> > size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly
> > positive.
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >
> > Ready to be installed?
> The original idea of distinction between jump alignment and loop
> alignment was that they have two basic meanings:
>  1) jump alignment is there to avoid jumping just to the end of decode
>  window (if the window is aligned) so CPU will get stuck after reaching
>  the jump and also to possibly reduce code cache polution by populating
>  by code that is not executed
>  2) loop alignment aims to fit loop in as few cache windows as possible
>
> Now if you have loop laid in a way that header of loop is not first
> basic block, 2) IMO still apply.  I.e.
>
> jump loop
> :loopback
> loop body
> :loop
> if cond jump to loopback
>
> So dropping loop alignment for those does not seem to make much sense
> from high level.  We may want to have differnt alignment for loops
> starting by header and loops starting in the middle, but I still liked
> more your patch which did bundles for loops.
>
> modern x86 chips are not very good testing targets on it.  I guess
> generic changes to alignment needs to be tested on other chips too.
>
> Honza
> > Thanks,
> > Martin
> >
> > gcc/ChangeLog:
> >
> > 2019-07-09  Martin Liska  
> >
> >   * final.c (compute_alignments): Apply the LOOP_ALIGN only
> >   to basic blocks that all loop headers.
> > ---
> >  gcc/final.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> >
>
> > diff --git a/gcc/final.c b/gcc/final.c
> > index fefc4874b24..ce2678da988 100644
> > --- a/gcc/final.c
> > +++ b/gcc/final.c
> > @@ -739,6 +739,7 @@ compute_alignments (void)
> >if (has_fallthru
> > && !(single_succ_p (bb)
> >  && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
> > +   && bb->loop_father->header == bb

I agree that the above is the wrong condition - but I'm not sure we
only end up using LOOP_ALIGN for blocks reached by a DFS_BACK
edge.  Note that DFS_BACK would have to be applied considering
the current CFG layout, simply doing mark_dfs_back_edges doesn't
work (we're in CFG layout mode here, no?).  Eventually the code
counting brances effectively already does this though.

The odd thing is that we apply LOOP_ALIGN only to blocks that
have a fallthru incoming edge.  I don't see Honzas example
above having one.

> > && optimize_bb_for_speed_p (bb)
> > && branch_count + fallthru_count > count_threshold
> > && (branch_count
> >
>
>
>


Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-07-09 Thread Martin Liška
On 7/9/19 11:56 AM, Jan Hubicka wrote:
>> Hi.
>>
>> I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the
>> basic blocks for which it makes the biggest sense. I quite some binary
>> size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly
>> positive.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
> The original idea of distinction between jump alignment and loop
> alignment was that they have two basic meanings:
>  1) jump alignment is there to avoid jumping just to the end of decode
>  window (if the window is aligned) so CPU will get stuck after reaching
>  the jump and also to possibly reduce code cache polution by populating
>  by code that is not executed
>  2) loop alignment aims to fit loop in as few cache windows as possible
> 
> Now if you have loop laid in a way that header of loop is not first
> basic block, 2) IMO still apply.  I.e.
> 
>   jump loop
> :loopback
>   loop body
> :loop
>   if cond jump to loopback
> 
> So dropping loop alignment for those does not seem to make much sense
> from high level.  We may want to have differnt alignment for loops
> starting by header and loops starting in the middle,

That's quite complicated condition, I would not introduce a new alignment.

> but I still liked
> more your patch which did bundles for loops.

The patch caused regression for quite some benchmarks and has it's own
problems (need of a recent GAS, not doing a bundle for bundles that
can't fit in a single bundle window). For that reasons, I decided
to not work on it any longer.

Martin

> 
> modern x86 chips are not very good testing targets on it.  I guess
> generic changes to alignment needs to be tested on other chips too.
> 
> Honza
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2019-07-09  Martin Liska  
>>
>>  * final.c (compute_alignments): Apply the LOOP_ALIGN only
>>  to basic blocks that all loop headers.
>> ---
>>  gcc/final.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>>
> 
>> diff --git a/gcc/final.c b/gcc/final.c
>> index fefc4874b24..ce2678da988 100644
>> --- a/gcc/final.c
>> +++ b/gcc/final.c
>> @@ -739,6 +739,7 @@ compute_alignments (void)
>>if (has_fallthru
>>&& !(single_succ_p (bb)
>> && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
>> +  && bb->loop_father->header == bb
>>&& optimize_bb_for_speed_p (bb)
>>&& branch_count + fallthru_count > count_threshold
>>&& (branch_count
>>
> 
> 
> 



Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.

2019-07-09 Thread Jan Hubicka
> Hi.
> 
> I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the
> basic blocks for which it makes the biggest sense. I quite some binary
> size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly
> positive.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
The original idea of distinction between jump alignment and loop
alignment was that they have two basic meanings:
 1) jump alignment is there to avoid jumping just to the end of decode
 window (if the window is aligned) so CPU will get stuck after reaching
 the jump and also to possibly reduce code cache polution by populating
 by code that is not executed
 2) loop alignment aims to fit loop in as few cache windows as possible

Now if you have loop laid in a way that header of loop is not first
basic block, 2) IMO still apply.  I.e.

jump loop
:loopback
loop body
:loop
if cond jump to loopback

So dropping loop alignment for those does not seem to make much sense
from high level.  We may want to have differnt alignment for loops
starting by header and loops starting in the middle, but I still liked
more your patch which did bundles for loops.

modern x86 chips are not very good testing targets on it.  I guess
generic changes to alignment needs to be tested on other chips too.

Honza
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-07-09  Martin Liska  
> 
>   * final.c (compute_alignments): Apply the LOOP_ALIGN only
>   to basic blocks that all loop headers.
> ---
>  gcc/final.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> 

> diff --git a/gcc/final.c b/gcc/final.c
> index fefc4874b24..ce2678da988 100644
> --- a/gcc/final.c
> +++ b/gcc/final.c
> @@ -739,6 +739,7 @@ compute_alignments (void)
>if (has_fallthru
> && !(single_succ_p (bb)
>  && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
> +   && bb->loop_father->header == bb
> && optimize_bb_for_speed_p (bb)
> && branch_count + fallthru_count > count_threshold
> && (branch_count
>