Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-30 Thread Jeff Law
On 06/30/2017 03:03 AM, Richard Earnshaw (lists) wrote:
> On 19/06/17 14:46, Richard Earnshaw (lists) wrote:
>> Many parallel set insns are of the form of a single set that also sets
>> the condition code flags.  In this case the cost of such an insn is
>> normally the cost of the part that doesn't set the flags, since updating
>> the condition flags is simply a side effect.
>>
>> At present all such insns are treated as having unknown cost (ie 0) and
>> combine assumes that such insns are infinitely more expensive than any
>> other insn sequence with a non-zero cost.
>>
>> This patch addresses this problem by allowing insn_rtx_cost to ignore
>> the condition setting part of a PARALLEL iff there is exactly one
>> comparison set and one non-comparison set.  If the only set operation is
>> a comparison we still use that as the basis of the insn cost.
>>
>>  * rtlanal.c (insn_rtx_cost): If a parallel contains exactly one
>>  comparison set and one other set, use the cost of the
>>  non-comparison set.
>>
>> Bootstrapped on aarch64-none-linuxgnu
>>
>> OK?
>>
> 
> Ping?
Needs a ChangeLog, with that, OK.  Ideally we'd have something which
verifies we're getting the cost right, but that's probably nontrivial to
do in a generic manner.

jeff


Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-30 Thread Richard Earnshaw (lists)
On 19/06/17 14:46, Richard Earnshaw (lists) wrote:
> Many parallel set insns are of the form of a single set that also sets
> the condition code flags.  In this case the cost of such an insn is
> normally the cost of the part that doesn't set the flags, since updating
> the condition flags is simply a side effect.
> 
> At present all such insns are treated as having unknown cost (ie 0) and
> combine assumes that such insns are infinitely more expensive than any
> other insn sequence with a non-zero cost.
> 
> This patch addresses this problem by allowing insn_rtx_cost to ignore
> the condition setting part of a PARALLEL iff there is exactly one
> comparison set and one non-comparison set.  If the only set operation is
> a comparison we still use that as the basis of the insn cost.
> 
>   * rtlanal.c (insn_rtx_cost): If a parallel contains exactly one
>   comparison set and one other set, use the cost of the
>   non-comparison set.
> 
> Bootstrapped on aarch64-none-linuxgnu
> 
> OK?
> 

Ping?

R.

> R.
> 
> 
> insn-costs.patch
> 
> 
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index d9f57c3..5cae793 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -5260,23 +5260,41 @@ insn_rtx_cost (rtx pat, bool speed)
>int i, cost;
>rtx set;
>  
> -  /* Extract the single set rtx from the instruction pattern.
> - We can't use single_set since we only have the pattern.  */
> +  /* Extract the single set rtx from the instruction pattern.  We
> + can't use single_set since we only have the pattern.  We also
> + consider PARALLELs of a normal set and and a single comparison.
> + In that case we use the cost of the non-comparison SET operation,
> + which is most-likely to be the real cost of this operation.  */
>if (GET_CODE (pat) == SET)
>  set = pat;
>else if (GET_CODE (pat) == PARALLEL)
>  {
>set = NULL_RTX;
> +  rtx comparison = NULL_RTX;
> +
>for (i = 0; i < XVECLEN (pat, 0); i++)
>   {
> rtx x = XVECEXP (pat, 0, i);
> if (GET_CODE (x) == SET)
>   {
> -   if (set)
> - return 0;
> -   set = x;
> +   if (GET_CODE (SET_SRC (x)) == COMPARE)
> + {
> +   if (comparison)
> + return 0;
> +   comparison = x;
> + }
> +   else
> + {
> +   if (set)
> + return 0;
> +   set = x;
> + }
>   }
>   }
> +
> +  if (!set && comparison)
> + set = comparison;
> +
>if (!set)
>   return 0;
>  }
> 



Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-20 Thread Segher Boessenkool
On Mon, Jun 19, 2017 at 12:40:53PM -0500, Segher Boessenkool wrote:
> On Mon, Jun 19, 2017 at 05:01:10PM +0100, Richard Earnshaw (lists) wrote:
> > Yeah, and I'm not suggesting we change the logic there (sorry if the
> > description was misleading).  Instead I'm proposing that we handle more
> > cases for parallels to not return zero.
> 
> Right.  My test run is half way through, will have results later --
> your change looks good to me, but it is always surprising whether
> better costs help or not, or even *hurt* good code generation (things
> are just too tightly tuned to the current behaviour, so some things
> may need retuning).

Everything built successfully (31 targets); --enable-checking=yes,rtl,tree
so it took a while, sorry.

The targets with any differences (table shows code size):

 old patched
 arm  11545709  11545797
 powerpc   8442762   8442746
  x86_64  10627428  10627363

Arm has very many differences, the others do not.  For powerpc (which
is 32-bit, 64-bit showed no differences) most of the difference is
scheduling deciding to do things a bit differently, and most of it
in places where we have not-so-good costs anyway.  For arm the effects
often cascade to bb-reorder making different decisions.

Anyway, all differences are small, it is not likely to hurt anything.
I support the patch, if that helps -- but I cannot approve it.


Segher


Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Segher Boessenkool
On Mon, Jun 19, 2017 at 05:01:10PM +0100, Richard Earnshaw (lists) wrote:
> Yeah, and I'm not suggesting we change the logic there (sorry if the
> description was misleading).  Instead I'm proposing that we handle more
> cases for parallels to not return zero.

Right.  My test run is half way through, will have results later --
your change looks good to me, but it is always surprising whether
better costs help or not, or even *hurt* good code generation (things
are just too tightly tuned to the current behaviour, so some things
may need retuning).


Segher


Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Richard Earnshaw (lists)
On 19/06/17 16:09, Segher Boessenkool wrote:
> On Mon, Jun 19, 2017 at 03:45:23PM +0100, Richard Earnshaw (lists) wrote:
 At present all such insns are treated as having unknown cost (ie 0) and
 combine assumes that such insns are infinitely more expensive than any
 other insn sequence with a non-zero cost.
>>>
>>> That's not what combine does: it optimistically assumes any combination
>>> with unknown costs is an improvement.
>>
>> Actually the logic is
>>
>>   int reject = old_cost > 0 && new_cost > old_cost;
>>
>> So reject will never be true if old cost is zero.
> 
> Yes, exactly; and neither if new_cost is zero.  If any cost is unknown
> combine just hopes for the best.
> 
> 
> Segher
> 

Yeah, and I'm not suggesting we change the logic there (sorry if the
description was misleading).  Instead I'm proposing that we handle more
cases for parallels to not return zero.

R.


Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Segher Boessenkool
On Mon, Jun 19, 2017 at 03:45:23PM +0100, Richard Earnshaw (lists) wrote:
> >> At present all such insns are treated as having unknown cost (ie 0) and
> >> combine assumes that such insns are infinitely more expensive than any
> >> other insn sequence with a non-zero cost.
> > 
> > That's not what combine does: it optimistically assumes any combination
> > with unknown costs is an improvement.
> 
> Actually the logic is
> 
>   int reject = old_cost > 0 && new_cost > old_cost;
> 
> So reject will never be true if old cost is zero.

Yes, exactly; and neither if new_cost is zero.  If any cost is unknown
combine just hopes for the best.


Segher


Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Segher Boessenkool
On Mon, Jun 19, 2017 at 03:28:20PM +0100, Richard Earnshaw (lists) wrote:
> > That's not what combine does: it optimistically assumes any combination
> > with unknown costs is an improvement.
> 
> So try this testcase on ARM.
> 
> unsigned long x, y, z;
> int b;
> void test()
> {
>b = __builtin_sub_overflow (y,z, &x);
> }
> 
> 
> Without the patch, combine rips apart a compare and subtract insn
> because it sees it as having cost zero and substitutes it with separate
> compare and subtract insns.

> The combine log before the patch shows:
> 
> allowing combination of insns 10 and 51
> original costs 0 + 8 = 0
> replacement costs 4 + 12 = 16

Yes, this is a good example of a case where your patch helps.  Thanks.

> So it is clearly deciding that the original costs are greater than the
> replacement costs.

No: it allows any combination with unknown cost (either old or new cost).
See combine_validate_cost.

> >> This patch addresses this problem by allowing insn_rtx_cost to ignore
> >> the condition setting part of a PARALLEL iff there is exactly one
> >> comparison set and one non-comparison set.  If the only set operation is
> >> a comparison we still use that as the basis of the insn cost.
> > 
> > I'll test this on a zillion archs, see what the effect is.
> > 
> > Have you considered costing general parallels as well?
> 
> I thought about it but concluded that there's no generically correct
> answer.  It might be the max of all the individual sets or it might be
> the sum, or it might be somewhere in between.  For example on ARM the
> load/store multiple operations are expressed as parallels, but their
> cost will depend on how many loads/stores happen in parallel within the
> hardware.
> 
> I think we'd need a new back-end hook to handle the other cases sensibly.

And in general make insn_rtx_cost do something more sane than just looking
at a set_src_cost, yeah.

The problem is changing any of this without regressing some targets.
Of course we are in stage 1 now ;-)


Segher


Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Richard Earnshaw (lists)
On 19/06/17 15:08, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jun 19, 2017 at 02:46:59PM +0100, Richard Earnshaw (lists) wrote:
>> Many parallel set insns are of the form of a single set that also sets
>> the condition code flags.  In this case the cost of such an insn is
>> normally the cost of the part that doesn't set the flags, since updating
>> the condition flags is simply a side effect.
>>
>> At present all such insns are treated as having unknown cost (ie 0) and
>> combine assumes that such insns are infinitely more expensive than any
>> other insn sequence with a non-zero cost.
> 
> That's not what combine does: it optimistically assumes any combination
> with unknown costs is an improvement.

Actually the logic is

  int reject = old_cost > 0 && new_cost > old_cost;


So reject will never be true if old cost is zero.

R.
> 
>> This patch addresses this problem by allowing insn_rtx_cost to ignore
>> the condition setting part of a PARALLEL iff there is exactly one
>> comparison set and one non-comparison set.  If the only set operation is
>> a comparison we still use that as the basis of the insn cost.
> 
> I'll test this on a zillion archs, see what the effect is.
> 
> Have you considered costing general parallels as well?
> 
> 
> Segher
> 



Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Richard Earnshaw (lists)
On 19/06/17 15:08, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jun 19, 2017 at 02:46:59PM +0100, Richard Earnshaw (lists) wrote:
>> Many parallel set insns are of the form of a single set that also sets
>> the condition code flags.  In this case the cost of such an insn is
>> normally the cost of the part that doesn't set the flags, since updating
>> the condition flags is simply a side effect.
>>
>> At present all such insns are treated as having unknown cost (ie 0) and
>> combine assumes that such insns are infinitely more expensive than any
>> other insn sequence with a non-zero cost.
> 
> That's not what combine does: it optimistically assumes any combination
> with unknown costs is an improvement.

So try this testcase on ARM.

unsigned long x, y, z;
int b;
void test()
{
   b = __builtin_sub_overflow (y,z, &x);
}


Without the patch, combine rips apart a compare and subtract insn
because it sees it as having cost zero and substitutes it with separate
compare and subtract insns.

ie before:


ldr r3, .L5
ldr r2, .L5+4
ldr r3, [r3]
ldr r2, [r2]
cmp r3, r2<=
movcs   r0, #0
movcc   r0, #1
ldr ip, .L5+8
ldr r1, .L5+12
sub r3, r3, r2  <=
str r3, [ip]
str r0, [r1]
bx  lr

after:

ldr r3, .L5
ldr r2, .L5+4
ldr r3, [r3]
ldr r2, [r2]
subsr3, r3, r2  <
movcc   r1, #1
movcs   r1, #0
ldr r0, .L5+8
ldr r2, .L5+12
str r3, [r0]
str r1, [r2]
bx  lr

The combine log before the patch shows:

allowing combination of insns 10 and 51
original costs 0 + 8 = 0
replacement costs 4 + 12 = 16

So it is clearly deciding that the original costs are greater than the
replacement costs.

> 
>> This patch addresses this problem by allowing insn_rtx_cost to ignore
>> the condition setting part of a PARALLEL iff there is exactly one
>> comparison set and one non-comparison set.  If the only set operation is
>> a comparison we still use that as the basis of the insn cost.
> 
> I'll test this on a zillion archs, see what the effect is.
> 
> Have you considered costing general parallels as well?
> 
> 

I thought about it but concluded that there's no generically correct
answer.  It might be the max of all the individual sets or it might be
the sum, or it might be somewhere in between.  For example on ARM the
load/store multiple operations are expressed as parallels, but their
cost will depend on how many loads/stores happen in parallel within the
hardware.

I think we'd need a new back-end hook to handle the other cases sensibly.

R.

> Segher
> 



Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Segher Boessenkool
Hi!

On Mon, Jun 19, 2017 at 02:46:59PM +0100, Richard Earnshaw (lists) wrote:
> Many parallel set insns are of the form of a single set that also sets
> the condition code flags.  In this case the cost of such an insn is
> normally the cost of the part that doesn't set the flags, since updating
> the condition flags is simply a side effect.
> 
> At present all such insns are treated as having unknown cost (ie 0) and
> combine assumes that such insns are infinitely more expensive than any
> other insn sequence with a non-zero cost.

That's not what combine does: it optimistically assumes any combination
with unknown costs is an improvement.

> This patch addresses this problem by allowing insn_rtx_cost to ignore
> the condition setting part of a PARALLEL iff there is exactly one
> comparison set and one non-comparison set.  If the only set operation is
> a comparison we still use that as the basis of the insn cost.

I'll test this on a zillion archs, see what the effect is.

Have you considered costing general parallels as well?


Segher


[rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Richard Earnshaw (lists)
Many parallel set insns are of the form of a single set that also sets
the condition code flags.  In this case the cost of such an insn is
normally the cost of the part that doesn't set the flags, since updating
the condition flags is simply a side effect.

At present all such insns are treated as having unknown cost (ie 0) and
combine assumes that such insns are infinitely more expensive than any
other insn sequence with a non-zero cost.

This patch addresses this problem by allowing insn_rtx_cost to ignore
the condition setting part of a PARALLEL iff there is exactly one
comparison set and one non-comparison set.  If the only set operation is
a comparison we still use that as the basis of the insn cost.

* rtlanal.c (insn_rtx_cost): If a parallel contains exactly one
comparison set and one other set, use the cost of the
non-comparison set.

Bootstrapped on aarch64-none-linuxgnu

OK?

R.
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index d9f57c3..5cae793 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5260,23 +5260,41 @@ insn_rtx_cost (rtx pat, bool speed)
   int i, cost;
   rtx set;
 
-  /* Extract the single set rtx from the instruction pattern.
- We can't use single_set since we only have the pattern.  */
+  /* Extract the single set rtx from the instruction pattern.  We
+ can't use single_set since we only have the pattern.  We also
+ consider PARALLELs of a normal set and and a single comparison.
+ In that case we use the cost of the non-comparison SET operation,
+ which is most-likely to be the real cost of this operation.  */
   if (GET_CODE (pat) == SET)
 set = pat;
   else if (GET_CODE (pat) == PARALLEL)
 {
   set = NULL_RTX;
+  rtx comparison = NULL_RTX;
+
   for (i = 0; i < XVECLEN (pat, 0); i++)
 	{
 	  rtx x = XVECEXP (pat, 0, i);
 	  if (GET_CODE (x) == SET)
 	{
-	  if (set)
-		return 0;
-	  set = x;
+	  if (GET_CODE (SET_SRC (x)) == COMPARE)
+		{
+		  if (comparison)
+		return 0;
+		  comparison = x;
+		}
+	  else
+		{
+		  if (set)
+		return 0;
+		  set = x;
+		}
 	}
 	}
+
+  if (!set && comparison)
+	set = comparison;
+
   if (!set)
 	return 0;
 }