Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-12-02 Thread Jeff Law

On 12/02/2016 03:22 PM, Segher Boessenkool wrote:

On Fri, Dec 02, 2016 at 09:47:00AM +0100, Richard Biener wrote:

STC tries to make as large as possible consecutive "traces", mainly to
help with instruction cache utilization and hit rate etc.  It cannot do
a very good job if it isn't allowed to copy blocks.

"simple" tries to (dynamically) have as many fall-throughs as possible,
i.e. as few jumps as possible.


I hope it special-cases backedges, that is, not make those fallthru.


It doesn't, and that is a good thing.

Firstly, what is classified as backedge doesn't make too much sense in
some cases; quite often the cfg isn't reducible.

Secondly, for simple loops it does not matter: the backedge has lower
frequency than the forward edges, so everything works out as you want.

Thirdly, consider this loop:

|
S<
| |
A |
   / \|
  B   C   |
   \ /|
D |
| |
E-
|
F

where the backedge now has higher frequency than A->B and A->C.
With simple this ends up as:

S: ...; goto A

D: ...
E: ...; if (not again) goto F
S: ...
A: ...; if () goto C
B: ...; goto D

C: ...; goto D

and this is cheaper than the alternative:

S: ...
A: ...; if () goto C
B: ...
D: ...
E: ...; if (again) goto A
F:

C: ...; goto D

If a fraction f of the loop iterations does C (so 1-f does B), the
alternative has 1+2f jumps per iteration; the code simple makes has 1+f.
In fact, the latter can be particularly bad with small icaches.  The 
first sequence is far better.  IIRC there were spec92 cases there 
getting this kind of thing right made a measurable difference on 
embedded targets.


Jeff


Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-12-02 Thread Segher Boessenkool
On Fri, Dec 02, 2016 at 09:47:00AM +0100, Richard Biener wrote:
> >> STC tries to make as large as possible consecutive "traces", mainly to
> >> help with instruction cache utilization and hit rate etc.  It cannot do
> >> a very good job if it isn't allowed to copy blocks.
> >>
> >> "simple" tries to (dynamically) have as many fall-throughs as possible,
> >> i.e. as few jumps as possible.
> 
> I hope it special-cases backedges, that is, not make those fallthru.

It doesn't, and that is a good thing.

Firstly, what is classified as backedge doesn't make too much sense in
some cases; quite often the cfg isn't reducible.

Secondly, for simple loops it does not matter: the backedge has lower
frequency than the forward edges, so everything works out as you want.

Thirdly, consider this loop:

|
S<
| |
A |
   / \|
  B   C   |
   \ /|
D |
| |
E-
|
F

where the backedge now has higher frequency than A->B and A->C.
With simple this ends up as:

S: ...; goto A

D: ...
E: ...; if (not again) goto F
S: ...
A: ...; if () goto C
B: ...; goto D

C: ...; goto D

and this is cheaper than the alternative:

S: ...
A: ...; if () goto C
B: ...
D: ...
E: ...; if (again) goto A
F:

C: ...; goto D

If a fraction f of the loop iterations does C (so 1-f does B), the
alternative has 1+2f jumps per iteration; the code simple makes has 1+f.

> > I think we've probably discussed this more than is really necessary.  We
> > just need to pick an alternative and go with it, I think either alternative
> > is reasonable (avoid copying when STC has no length information or fall back
> > to simple when there is no length information).
> 
> The description of both algorithms doesn't make it an obvious pick.  So lets
> leave the decision of the algorithm to the target and make STC beave sane
> with no length information (whether that means disallowing any copying or
> substituting a "default" length is another question -- but obviously it's the
> targets fault to not provide lenght information).

I agree.


Segher


Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-12-02 Thread Richard Biener
On Thu, Dec 1, 2016 at 6:28 PM, Jeff Law  wrote:
> On 12/01/2016 05:04 AM, Segher Boessenkool wrote:
>>
>> On Thu, Dec 01, 2016 at 10:19:42AM +0100, Richard Biener wrote:
>
> Thinking about this again maybe targets w/o insn-length should simply
> always use the 'simple' algorithm instead of the STV one?  At least
> that
> might be what your change effectively does in some way?


 From reading the comments I don't think STC will collapse down into the
 simple algorithm if block copying is disabled.  But Segher would know
 for
 sure.

 WRT the choice of simple vs STC, I doubt it matters much for the
 processors
 in question.
>>>
>>>
>>> I guess STC doesn't make much sense if we can't say anything about BB
>>> sizes.
>>
>>
>> STC tries to make as large as possible consecutive "traces", mainly to
>> help with instruction cache utilization and hit rate etc.  It cannot do
>> a very good job if it isn't allowed to copy blocks.
>>
>> "simple" tries to (dynamically) have as many fall-throughs as possible,
>> i.e. as few jumps as possible.

I hope it special-cases backedges, that is, not make those fallthru.

>>  It never copies code; if that means it
>> has to jump every second insn, so be it.  It provably is within a factor
>> three of optimal (optimal is NP-hard), under a really weak assumption
>> within a factor two, and it does better than that in practice.
>>
>> STC without block copying makes longer traces which is not a good idea
>> for most architectures, only for those that have a short jump that is
>> much shorter than longer jumps (and thus, cannot cover many jump
>> targets).
>>
>> I do not know how STC behaves when it does not know the insn lengths.
>
> mn103 &  m68k are definitely sensitive to jump distances, the former more so
> than the latter.  Some of the others probably are as well.
>
> I think we've probably discussed this more than is really necessary.  We
> just need to pick an alternative and go with it, I think either alternative
> is reasonable (avoid copying when STC has no length information or fall back
> to simple when there is no length information).

The description of both algorithms doesn't make it an obvious pick.  So lets
leave the decision of the algorithm to the target and make STC beave sane
with no length information (whether that means disallowing any copying or
substituting a "default" length is another question -- but obviously it's the
targets fault to not provide lenght information).

Richard.

>
>
> jeff


Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-12-01 Thread Jeff Law

On 12/01/2016 05:04 AM, Segher Boessenkool wrote:

On Thu, Dec 01, 2016 at 10:19:42AM +0100, Richard Biener wrote:

Thinking about this again maybe targets w/o insn-length should simply
always use the 'simple' algorithm instead of the STV one?  At least that
might be what your change effectively does in some way?


From reading the comments I don't think STC will collapse down into the
simple algorithm if block copying is disabled.  But Segher would know for
sure.

WRT the choice of simple vs STC, I doubt it matters much for the processors
in question.


I guess STC doesn't make much sense if we can't say anything about BB sizes.


STC tries to make as large as possible consecutive "traces", mainly to
help with instruction cache utilization and hit rate etc.  It cannot do
a very good job if it isn't allowed to copy blocks.

"simple" tries to (dynamically) have as many fall-throughs as possible,
i.e. as few jumps as possible.  It never copies code; if that means it
has to jump every second insn, so be it.  It provably is within a factor
three of optimal (optimal is NP-hard), under a really weak assumption
within a factor two, and it does better than that in practice.

STC without block copying makes longer traces which is not a good idea
for most architectures, only for those that have a short jump that is
much shorter than longer jumps (and thus, cannot cover many jump
targets).

I do not know how STC behaves when it does not know the insn lengths.
mn103 &  m68k are definitely sensitive to jump distances, the former 
more so than the latter.  Some of the others probably are as well.


I think we've probably discussed this more than is really necessary.  We 
just need to pick an alternative and go with it, I think either 
alternative is reasonable (avoid copying when STC has no length 
information or fall back to simple when there is no length information).




jeff


Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-12-01 Thread Segher Boessenkool
On Thu, Dec 01, 2016 at 10:19:42AM +0100, Richard Biener wrote:
> >> Thinking about this again maybe targets w/o insn-length should simply
> >> always use the 'simple' algorithm instead of the STV one?  At least that
> >> might be what your change effectively does in some way?
> >
> > From reading the comments I don't think STC will collapse down into the
> > simple algorithm if block copying is disabled.  But Segher would know for
> > sure.
> >
> > WRT the choice of simple vs STC, I doubt it matters much for the processors
> > in question.
> 
> I guess STC doesn't make much sense if we can't say anything about BB sizes.

STC tries to make as large as possible consecutive "traces", mainly to
help with instruction cache utilization and hit rate etc.  It cannot do
a very good job if it isn't allowed to copy blocks.

"simple" tries to (dynamically) have as many fall-throughs as possible,
i.e. as few jumps as possible.  It never copies code; if that means it
has to jump every second insn, so be it.  It provably is within a factor
three of optimal (optimal is NP-hard), under a really weak assumption
within a factor two, and it does better than that in practice.

STC without block copying makes longer traces which is not a good idea
for most architectures, only for those that have a short jump that is
much shorter than longer jumps (and thus, cannot cover many jump
targets).

I do not know how STC behaves when it does not know the insn lengths.


Segher


Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-12-01 Thread Richard Biener
On Wed, Nov 30, 2016 at 6:59 PM, Jeff Law  wrote:
> On 11/30/2016 01:38 AM, Richard Biener wrote:
>>
>> On Tue, Nov 29, 2016 at 5:07 PM, Jeff Law  wrote:
>>>
>>> On 11/29/2016 03:23 AM, Richard Biener wrote:


 On Mon, Nov 28, 2016 at 10:23 PM, Jeff Law  wrote:
>
>
>
>
> I was digging into  issues around the patches for 78120 when I stumbled
> upon
> undesirable bb copying in bb-reorder.c on the m68k.
>
> The core issue is that the m68k does not define a length attribute and
> therefore generic code assumes that the length of all insns is 0 bytes.



 What other targets behave like this?
>>>
>>>
>>> ft32, nvptx, mmix, mn10300, m68k, c6x, rl78, vax, ia64, m32c
>>
>>
>> Ok.
>>
>>> cris has a hack to define a length, even though no attempt is made to
>>> make
>>> it accurate.  The hack specifically calls out that it's to make
>>> bb-reorder
>>> happy.
>>>

> That in turn makes bb-reorder think it is infinitely cheap to copy
> basic
> blocks.  In the two codebases I looked at (GCC's runtime libraries and
> newlib) this leads to a 10% and 15% undesirable increase in code size.
>
> I've taken a slight variant of this patch and bootstrapped/regression
> tested
> it on x86_64-linux-gnu to verify sanity as well as built the m68k
> target
> libraries noted above.
>
> OK for the trunk?



 I wonder if it isn't better to default to a length of 1 instead of zero
 when
 there is no length attribute.  There are more users of the length
 attribute
 in bb-reorder.c (and elsewhere as well I suppose).
>>>
>>>
>>> I pondered that as well, but felt it was riskier given we've had a
>>> default
>>> length of 0 for ports that don't define lengths since the early 90s.
>>> It's
>>> certainly easy enough to change that default if you'd prefer.  I don't
>>> have
>>> a strong preference either way.
>>
>>
>> Thinking about this again maybe targets w/o insn-length should simply
>> always use the 'simple' algorithm instead of the STV one?  At least that
>> might be what your change effectively does in some way?
>
> From reading the comments I don't think STC will collapse down into the
> simple algorithm if block copying is disabled.  But Segher would know for
> sure.
>
> WRT the choice of simple vs STC, I doubt it matters much for the processors
> in question.

I guess STC doesn't make much sense if we can't say anything about BB sizes.

Richard.

>
> JEff


Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-11-30 Thread Jeff Law

On 11/30/2016 01:38 AM, Richard Biener wrote:

On Tue, Nov 29, 2016 at 5:07 PM, Jeff Law  wrote:

On 11/29/2016 03:23 AM, Richard Biener wrote:


On Mon, Nov 28, 2016 at 10:23 PM, Jeff Law  wrote:




I was digging into  issues around the patches for 78120 when I stumbled
upon
undesirable bb copying in bb-reorder.c on the m68k.

The core issue is that the m68k does not define a length attribute and
therefore generic code assumes that the length of all insns is 0 bytes.



What other targets behave like this?


ft32, nvptx, mmix, mn10300, m68k, c6x, rl78, vax, ia64, m32c


Ok.


cris has a hack to define a length, even though no attempt is made to make
it accurate.  The hack specifically calls out that it's to make bb-reorder
happy.




That in turn makes bb-reorder think it is infinitely cheap to copy basic
blocks.  In the two codebases I looked at (GCC's runtime libraries and
newlib) this leads to a 10% and 15% undesirable increase in code size.

I've taken a slight variant of this patch and bootstrapped/regression
tested
it on x86_64-linux-gnu to verify sanity as well as built the m68k target
libraries noted above.

OK for the trunk?



I wonder if it isn't better to default to a length of 1 instead of zero
when
there is no length attribute.  There are more users of the length
attribute
in bb-reorder.c (and elsewhere as well I suppose).


I pondered that as well, but felt it was riskier given we've had a default
length of 0 for ports that don't define lengths since the early 90s.  It's
certainly easy enough to change that default if you'd prefer.  I don't have
a strong preference either way.


Thinking about this again maybe targets w/o insn-length should simply
always use the 'simple' algorithm instead of the STV one?  At least that
might be what your change effectively does in some way?
From reading the comments I don't think STC will collapse down into the 
simple algorithm if block copying is disabled.  But Segher would know 
for sure.


WRT the choice of simple vs STC, I doubt it matters much for the 
processors in question.


JEff


Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-11-30 Thread Richard Biener
On Tue, Nov 29, 2016 at 5:07 PM, Jeff Law  wrote:
> On 11/29/2016 03:23 AM, Richard Biener wrote:
>>
>> On Mon, Nov 28, 2016 at 10:23 PM, Jeff Law  wrote:
>>>
>>>
>>>
>>> I was digging into  issues around the patches for 78120 when I stumbled
>>> upon
>>> undesirable bb copying in bb-reorder.c on the m68k.
>>>
>>> The core issue is that the m68k does not define a length attribute and
>>> therefore generic code assumes that the length of all insns is 0 bytes.
>>
>>
>> What other targets behave like this?
>
> ft32, nvptx, mmix, mn10300, m68k, c6x, rl78, vax, ia64, m32c

Ok.

> cris has a hack to define a length, even though no attempt is made to make
> it accurate.  The hack specifically calls out that it's to make bb-reorder
> happy.
>
>>
>>> That in turn makes bb-reorder think it is infinitely cheap to copy basic
>>> blocks.  In the two codebases I looked at (GCC's runtime libraries and
>>> newlib) this leads to a 10% and 15% undesirable increase in code size.
>>>
>>> I've taken a slight variant of this patch and bootstrapped/regression
>>> tested
>>> it on x86_64-linux-gnu to verify sanity as well as built the m68k target
>>> libraries noted above.
>>>
>>> OK for the trunk?
>>
>>
>> I wonder if it isn't better to default to a length of 1 instead of zero
>> when
>> there is no length attribute.  There are more users of the length
>> attribute
>> in bb-reorder.c (and elsewhere as well I suppose).
>
> I pondered that as well, but felt it was riskier given we've had a default
> length of 0 for ports that don't define lengths since the early 90s.  It's
> certainly easy enough to change that default if you'd prefer.  I don't have
> a strong preference either way.

Thinking about this again maybe targets w/o insn-length should simply
always use the 'simple' algorithm instead of the STV one?  At least that
might be what your change effectively does in some way?

Richard.

> Jeff


Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-11-29 Thread Jeff Law

On 11/29/2016 03:23 AM, Richard Biener wrote:

On Mon, Nov 28, 2016 at 10:23 PM, Jeff Law  wrote:



I was digging into  issues around the patches for 78120 when I stumbled upon
undesirable bb copying in bb-reorder.c on the m68k.

The core issue is that the m68k does not define a length attribute and
therefore generic code assumes that the length of all insns is 0 bytes.


What other targets behave like this?

ft32, nvptx, mmix, mn10300, m68k, c6x, rl78, vax, ia64, m32c

cris has a hack to define a length, even though no attempt is made to 
make it accurate.  The hack specifically calls out that it's to make 
bb-reorder happy.





That in turn makes bb-reorder think it is infinitely cheap to copy basic
blocks.  In the two codebases I looked at (GCC's runtime libraries and
newlib) this leads to a 10% and 15% undesirable increase in code size.

I've taken a slight variant of this patch and bootstrapped/regression tested
it on x86_64-linux-gnu to verify sanity as well as built the m68k target
libraries noted above.

OK for the trunk?


I wonder if it isn't better to default to a length of 1 instead of zero when
there is no length attribute.  There are more users of the length attribute
in bb-reorder.c (and elsewhere as well I suppose).
I pondered that as well, but felt it was riskier given we've had a 
default length of 0 for ports that don't define lengths since the early 
90s.  It's certainly easy enough to change that default if you'd prefer. 
 I don't have a strong preference either way.


Jeff


Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-11-29 Thread Richard Biener
On Mon, Nov 28, 2016 at 10:23 PM, Jeff Law  wrote:
>
>
> I was digging into  issues around the patches for 78120 when I stumbled upon
> undesirable bb copying in bb-reorder.c on the m68k.
>
> The core issue is that the m68k does not define a length attribute and
> therefore generic code assumes that the length of all insns is 0 bytes.

What other targets behave like this?

> That in turn makes bb-reorder think it is infinitely cheap to copy basic
> blocks.  In the two codebases I looked at (GCC's runtime libraries and
> newlib) this leads to a 10% and 15% undesirable increase in code size.
>
> I've taken a slight variant of this patch and bootstrapped/regression tested
> it on x86_64-linux-gnu to verify sanity as well as built the m68k target
> libraries noted above.
>
> OK for the trunk?

I wonder if it isn't better to default to a length of 1 instead of zero when
there is no length attribute.  There are more users of the length attribute
in bb-reorder.c (and elsewhere as well I suppose).

from get_attr_length_1 it looks like a "cheap" target dependent way
would be to define ADJUST_INSN_LENGTH ...

Richard.

> Jeff
>
> * bb-reorder.c (copy_bb_p): Sanely handle case where the target
> has not defined length attributes for its insns.
>
> diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
> index 6873b4f..0b8d1d9 100644
> --- a/gcc/bb-reorder.c
> +++ b/gcc/bb-reorder.c
> @@ -115,6 +115,7 @@
>  #include "bb-reorder.h"
>  #include "except.h"
>  #include "fibonacci_heap.h"
> +#include "insn-attr.h"
>
>  /* The number of rounds.  In most cases there will only be 4 rounds, but
> when partitioning hot and cold basic blocks into separate sections of
> @@ -1355,6 +1356,9 @@ copy_bb_p (const_basic_block bb, int code_may_grow)
>int max_size = uncond_jump_length;
>rtx_insn *insn;
>
> +  if (!HAVE_ATTR_length)
> +return false;
> +
>if (!bb->frequency)
>  return false;
>if (EDGE_COUNT (bb->preds) < 2)
>