Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2016-08-10 Thread Vladimir N Makarov



On 08/08/2016 01:04 PM, Jiong Wang wrote:

[...]
There is very tiny performance regression on SPEC2K6INT
464.h264ref. Checking the codegen, there are some bad instruction
scheduling, it looks to me caused by REG_ALLOC_ORDER is not used
consistently inside IRA that parts of the code are using new allocation
order while some other parts are still using original order.

I see in ira.c:setup_class_hard_regs we are checking whether
REG_ALLOC_ORDER is defined:

   #ifdef REG_ALLOC_ORDER
   hard_regno = reg_alloc_order[i];
   #else
   hard_regno = i;
   #endif

but in ira-color.c:assign_hard_reg, there is no similar check:

   /* We don't care about giving callee saved registers to allocnos no
  living through calls because call clobbered registers are
  allocated first (it is usual practice to put them first in
  REG_ALLOC_ORDER).  */
   mode = ALLOCNO_MODE (a);
   for (i = 0; i < class_size; i++)
 {
   hard_regno = ira_class_hard_regs[aclass][i];

We might want to use reg_alloc_order to map above "i" if REG_ALLOC_ORDER is
defined?

Vlad, any comments?

The order is already present in ira_class_hard_regs.  So there is no 
need to use it in ira-color.c::assign_hard_reg.




Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2016-08-08 Thread Jiong Wang

Jiong Wang writes:

> Andrew Pinski writes:
>
>> On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh
>>  wrote:
>>> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote:
 > On Jul 27, 2015, at 2:26 AM, Jiong Wang  wrote:
 >
 > Andrew Pinski writes:
 >
 >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang  wrote:
 >>>
 >>> James Greenhalgh writes:
 >>>
 > On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
 > Current IRA still use both target macros in a few places.
 >
 > Tell IRA to use the order we defined rather than with it's own cost
 > calculation. Allocate caller saved first, then callee saved.
 >
 > This is especially useful for LR/x30, as it's free to allocate and is
 > pure caller saved when used in leaf function.
 >
 > Haven't noticed significant impact on benchmarks, but by grepping 
 > some
 > keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the 
 > number
 > is smaller.
 >
 > OK for trunk?
 
  OK, sorry for the delay.
 
  It might be mail client mangling, but please check that the trailing 
  slashes
  line up in the version that gets committed.
 
  Thanks,
  James
 
 > 2015-05-19  Jiong. Wang  
 >
 > gcc/
 >  PR 63521
 >  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
 >  (HONOR_REG_ALLOC_ORDER): Define.
 >>>
 >>> Patch reverted.
 >>
 >> I did not see a reason why this patch was reverted.  Maybe I am
 >> missing an email or something.
 >
 > There are several execution regressions under gcc testsuite, although as
 > far as I can see it's this patch exposed hidding bugs in those
 > testcases, but there might be one other issue, so to be conservative, I
 > temporarily reverted this patch.

 If you are talking about:
 gcc.target/aarch64/aapcs64/func-ret-2.c execution
 Etc.

 These test cases are too dependent on the original register allocation 
 order
 and really can be safely ignored. Really these three tests should be moved 
 or
 written in a more sane way.
>>>
>>> Yup, completely agreed - but the testcases do throw up something
>>> interesting. If we are allocating registers to hold 128-bit values, and
>>> we pick x7 as highest preference, we implicitly allocate x8 along with it.
>>> I think we probably see the same thing if the first thing we do in a
>>> function is a structure copy through a back-end expanded movmem, which
>>> will likely begin with a 128-bit LDP using x7, x8.
>>>
>>> If the argument for this patch is that we prefer to allocate x7-x0 first,
>>> followed by x8, then we've potentially made a sub-optimal decision, our
>>> allocation order for 128-bit values is x7,x8,x5,x6 etc.
>>>
>>> My hunch is that we *might* get better code generation in this corner case
>>> out of some permutation of the allocation order for argument
>>> registers. I'm thinking something along the lines of
>>>
>>>   {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }
>>>
>>> I asked Jiong to take a look at that, and I agree with his decision to
>>> reduce the churn on trunk and just revert the patch until we've come to
>>> a conclusion based on some evidence - rather than just my hunch! I agree
>>> that it would be harmless on trunk from a testing point of view, but I
>>> think Jiong is right to revert the patch until we better understand the
>>> code-generation implications.
>>>
>>> Of course, it might be that I am completely wrong! If you've already taken
>>> a look at using a register allocation order like the example I gave and
>>> have something to share, I'd be happy to read your advice!
>>
>> Any news on this patch?  It has been a year since it was reverted for
>> a bad test that was failing.
>
> Hi Andrew,
>
>   Yeah, those tests actually are expected to fail once register
>   allocation order changed, it's clearly documented in the comments:
>
>   gcc.target/aarch64/aapcs64/abitest-2.h:
>   
> /* ...
>
>Note that for value that is returned in the caller-allocated memory
>block, we get the address from the saved x8 register.  x8 is saved
>just after the callee is returned; we assume that x8 has not been
>clobbered at then, although there is no requirement for the callee
>preserve the value stored in x8.  Luckily, all test cases here are
>simple enough that x8 doesn't normally get clobbered (although not
>guaranteed).  */
>
>   I had a local fix which use the redundant value returned to x0 to
>   repair the clobbered value in x8 as they will be identical for
>   structure type return, however that trick can't play anymore as we
>   recently defined TARGET_OMIT_STRUCT_RETURN_REG to true which will
>   

Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2016-08-05 Thread Jiong Wang

Andrew Pinski writes:

> On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh
>  wrote:
>> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote:
>>> > On Jul 27, 2015, at 2:26 AM, Jiong Wang  wrote:
>>> >
>>> > Andrew Pinski writes:
>>> >
>>> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang  wrote:
>>> >>>
>>> >>> James Greenhalgh writes:
>>> >>>
>>> > On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>>> > Current IRA still use both target macros in a few places.
>>> >
>>> > Tell IRA to use the order we defined rather than with it's own cost
>>> > calculation. Allocate caller saved first, then callee saved.
>>> >
>>> > This is especially useful for LR/x30, as it's free to allocate and is
>>> > pure caller saved when used in leaf function.
>>> >
>>> > Haven't noticed significant impact on benchmarks, but by grepping some
>>> > keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the 
>>> > number
>>> > is smaller.
>>> >
>>> > OK for trunk?
>>> 
>>>  OK, sorry for the delay.
>>> 
>>>  It might be mail client mangling, but please check that the trailing 
>>>  slashes
>>>  line up in the version that gets committed.
>>> 
>>>  Thanks,
>>>  James
>>> 
>>> > 2015-05-19  Jiong. Wang  
>>> >
>>> > gcc/
>>> >  PR 63521
>>> >  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>>> >  (HONOR_REG_ALLOC_ORDER): Define.
>>> >>>
>>> >>> Patch reverted.
>>> >>
>>> >> I did not see a reason why this patch was reverted.  Maybe I am
>>> >> missing an email or something.
>>> >
>>> > There are several execution regressions under gcc testsuite, although as
>>> > far as I can see it's this patch exposed hidding bugs in those
>>> > testcases, but there might be one other issue, so to be conservative, I
>>> > temporarily reverted this patch.
>>>
>>> If you are talking about:
>>> gcc.target/aarch64/aapcs64/func-ret-2.c execution
>>> Etc.
>>>
>>> These test cases are too dependent on the original register allocation order
>>> and really can be safely ignored. Really these three tests should be moved 
>>> or
>>> written in a more sane way.
>>
>> Yup, completely agreed - but the testcases do throw up something
>> interesting. If we are allocating registers to hold 128-bit values, and
>> we pick x7 as highest preference, we implicitly allocate x8 along with it.
>> I think we probably see the same thing if the first thing we do in a
>> function is a structure copy through a back-end expanded movmem, which
>> will likely begin with a 128-bit LDP using x7, x8.
>>
>> If the argument for this patch is that we prefer to allocate x7-x0 first,
>> followed by x8, then we've potentially made a sub-optimal decision, our
>> allocation order for 128-bit values is x7,x8,x5,x6 etc.
>>
>> My hunch is that we *might* get better code generation in this corner case
>> out of some permutation of the allocation order for argument
>> registers. I'm thinking something along the lines of
>>
>>   {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }
>>
>> I asked Jiong to take a look at that, and I agree with his decision to
>> reduce the churn on trunk and just revert the patch until we've come to
>> a conclusion based on some evidence - rather than just my hunch! I agree
>> that it would be harmless on trunk from a testing point of view, but I
>> think Jiong is right to revert the patch until we better understand the
>> code-generation implications.
>>
>> Of course, it might be that I am completely wrong! If you've already taken
>> a look at using a register allocation order like the example I gave and
>> have something to share, I'd be happy to read your advice!
>
> Any news on this patch?  It has been a year since it was reverted for
> a bad test that was failing.

Hi Andrew,

  Yeah, those tests actually are expected to fail once register
  allocation order changed, it's clearly documented in the comments:

  gcc.target/aarch64/aapcs64/abitest-2.h:
  
/* ...

   Note that for value that is returned in the caller-allocated memory
   block, we get the address from the saved x8 register.  x8 is saved
   just after the callee is returned; we assume that x8 has not been
   clobbered at then, although there is no requirement for the callee
   preserve the value stored in x8.  Luckily, all test cases here are
   simple enough that x8 doesn't normally get clobbered (although not
   guaranteed).  */

  I had a local fix which use the redundant value returned to x0 to
  repair the clobbered value in x8 as they will be identical for
  structure type return, however that trick can't play anymore as we
  recently defined TARGET_OMIT_STRUCT_RETURN_REG to true which will
  remove that redundant x8 to x0 copy.
  
  Anyway, I will come back with some benchmark results of this patch on
  top of latest trunk after the weekend run, and also answers 

Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2016-08-03 Thread Andrew Pinski
On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh
 wrote:
> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote:
>> > On Jul 27, 2015, at 2:26 AM, Jiong Wang  wrote:
>> >
>> > Andrew Pinski writes:
>> >
>> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang  wrote:
>> >>>
>> >>> James Greenhalgh writes:
>> >>>
>> > On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>> > Current IRA still use both target macros in a few places.
>> >
>> > Tell IRA to use the order we defined rather than with it's own cost
>> > calculation. Allocate caller saved first, then callee saved.
>> >
>> > This is especially useful for LR/x30, as it's free to allocate and is
>> > pure caller saved when used in leaf function.
>> >
>> > Haven't noticed significant impact on benchmarks, but by grepping some
>> > keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
>> > is smaller.
>> >
>> > OK for trunk?
>> 
>>  OK, sorry for the delay.
>> 
>>  It might be mail client mangling, but please check that the trailing 
>>  slashes
>>  line up in the version that gets committed.
>> 
>>  Thanks,
>>  James
>> 
>> > 2015-05-19  Jiong. Wang  
>> >
>> > gcc/
>> >  PR 63521
>> >  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>> >  (HONOR_REG_ALLOC_ORDER): Define.
>> >>>
>> >>> Patch reverted.
>> >>
>> >> I did not see a reason why this patch was reverted.  Maybe I am
>> >> missing an email or something.
>> >
>> > There are several execution regressions under gcc testsuite, although as
>> > far as I can see it's this patch exposed hidding bugs in those
>> > testcases, but there might be one other issue, so to be conservative, I
>> > temporarily reverted this patch.
>>
>> If you are talking about:
>> gcc.target/aarch64/aapcs64/func-ret-2.c execution
>> Etc.
>>
>> These test cases are too dependent on the original register allocation order
>> and really can be safely ignored. Really these three tests should be moved or
>> written in a more sane way.
>
> Yup, completely agreed - but the testcases do throw up something
> interesting. If we are allocating registers to hold 128-bit values, and
> we pick x7 as highest preference, we implicitly allocate x8 along with it.
> I think we probably see the same thing if the first thing we do in a
> function is a structure copy through a back-end expanded movmem, which
> will likely begin with a 128-bit LDP using x7, x8.
>
> If the argument for this patch is that we prefer to allocate x7-x0 first,
> followed by x8, then we've potentially made a sub-optimal decision, our
> allocation order for 128-bit values is x7,x8,x5,x6 etc.
>
> My hunch is that we *might* get better code generation in this corner case
> out of some permutation of the allocation order for argument
> registers. I'm thinking something along the lines of
>
>   {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }
>
> I asked Jiong to take a look at that, and I agree with his decision to
> reduce the churn on trunk and just revert the patch until we've come to
> a conclusion based on some evidence - rather than just my hunch! I agree
> that it would be harmless on trunk from a testing point of view, but I
> think Jiong is right to revert the patch until we better understand the
> code-generation implications.
>
> Of course, it might be that I am completely wrong! If you've already taken
> a look at using a register allocation order like the example I gave and
> have something to share, I'd be happy to read your advice!

Any news on this patch?  It has been a year since it was reverted for
a bad test that was failing.

Thanks,
Andrew

>
> Thanks,
> James
>


Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2015-07-27 Thread James Greenhalgh
On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote:
  On Jul 27, 2015, at 2:26 AM, Jiong Wang jiong.w...@arm.com wrote:
  
  Andrew Pinski writes:
  
  On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang jiong.w...@arm.com wrote:
  
  James Greenhalgh writes:
  
  On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
  Current IRA still use both target macros in a few places.
  
  Tell IRA to use the order we defined rather than with it's own cost
  calculation. Allocate caller saved first, then callee saved.
  
  This is especially useful for LR/x30, as it's free to allocate and is
  pure caller saved when used in leaf function.
  
  Haven't noticed significant impact on benchmarks, but by grepping some
  keywords like Spilling, Push.*spill etc in ira rtl dump, the number
  is smaller.
  
  OK for trunk?
  
  OK, sorry for the delay.
  
  It might be mail client mangling, but please check that the trailing 
  slashes
  line up in the version that gets committed.
  
  Thanks,
  James
  
  2015-05-19  Jiong. Wang  jiong.w...@arm.com
  
  gcc/
   PR 63521
   * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
   (HONOR_REG_ALLOC_ORDER): Define.
  
  Patch reverted.
  
  I did not see a reason why this patch was reverted.  Maybe I am
  missing an email or something.
  
  There are several execution regressions under gcc testsuite, although as
  far as I can see it's this patch exposed hidding bugs in those
  testcases, but there might be one other issue, so to be conservative, I
  temporarily reverted this patch.
 
 If you are talking about:
 gcc.target/aarch64/aapcs64/func-ret-2.c execution
 Etc.
 
 These test cases are too dependent on the original register allocation order
 and really can be safely ignored. Really these three tests should be moved or
 written in a more sane way. 

Yup, completely agreed - but the testcases do throw up something
interesting. If we are allocating registers to hold 128-bit values, and
we pick x7 as highest preference, we implicitly allocate x8 along with it.
I think we probably see the same thing if the first thing we do in a
function is a structure copy through a back-end expanded movmem, which
will likely begin with a 128-bit LDP using x7, x8.

If the argument for this patch is that we prefer to allocate x7-x0 first,
followed by x8, then we've potentially made a sub-optimal decision, our
allocation order for 128-bit values is x7,x8,x5,x6 etc.

My hunch is that we *might* get better code generation in this corner case
out of some permutation of the allocation order for argument
registers. I'm thinking something along the lines of

  {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }

I asked Jiong to take a look at that, and I agree with his decision to
reduce the churn on trunk and just revert the patch until we've come to
a conclusion based on some evidence - rather than just my hunch! I agree
that it would be harmless on trunk from a testing point of view, but I
think Jiong is right to revert the patch until we better understand the
code-generation implications.

Of course, it might be that I am completely wrong! If you've already taken
a look at using a register allocation order like the example I gave and
have something to share, I'd be happy to read your advice!

Thanks,
James



Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2015-07-27 Thread Jiong Wang

Andrew Pinski writes:

 On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang jiong.w...@arm.com wrote:

 James Greenhalgh writes:

 On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
 Current IRA still use both target macros in a few places.

 Tell IRA to use the order we defined rather than with it's own cost
 calculation. Allocate caller saved first, then callee saved.

 This is especially useful for LR/x30, as it's free to allocate and is
 pure caller saved when used in leaf function.

 Haven't noticed significant impact on benchmarks, but by grepping some
 keywords like Spilling, Push.*spill etc in ira rtl dump, the number
 is smaller.

 OK for trunk?

 OK, sorry for the delay.

 It might be mail client mangling, but please check that the trailing slashes
 line up in the version that gets committed.

 Thanks,
 James

 2015-05-19  Jiong. Wang  jiong.w...@arm.com

 gcc/
   PR 63521
   * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
   (HONOR_REG_ALLOC_ORDER): Define.

 Patch reverted.

 I did not see a reason why this patch was reverted.  Maybe I am
 missing an email or something.

There are several execution regressions under gcc testsuite, although as
far as I can see it's this patch exposed hidding bugs in those
testcases, but there might be one other issue, so to be conservative, I
temporarily reverted this patch.


 Thanks,
 Andrew




-- 
Regards,
Jiong



Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2015-07-27 Thread pinskia




 On Jul 27, 2015, at 2:26 AM, Jiong Wang jiong.w...@arm.com wrote:
 
 
 Andrew Pinski writes:
 
 On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang jiong.w...@arm.com wrote:
 
 James Greenhalgh writes:
 
 On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
 Current IRA still use both target macros in a few places.
 
 Tell IRA to use the order we defined rather than with it's own cost
 calculation. Allocate caller saved first, then callee saved.
 
 This is especially useful for LR/x30, as it's free to allocate and is
 pure caller saved when used in leaf function.
 
 Haven't noticed significant impact on benchmarks, but by grepping some
 keywords like Spilling, Push.*spill etc in ira rtl dump, the number
 is smaller.
 
 OK for trunk?
 
 OK, sorry for the delay.
 
 It might be mail client mangling, but please check that the trailing 
 slashes
 line up in the version that gets committed.
 
 Thanks,
 James
 
 2015-05-19  Jiong. Wang  jiong.w...@arm.com
 
 gcc/
  PR 63521
  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
  (HONOR_REG_ALLOC_ORDER): Define.
 
 Patch reverted.
 
 I did not see a reason why this patch was reverted.  Maybe I am
 missing an email or something.
 
 There are several execution regressions under gcc testsuite, although as
 far as I can see it's this patch exposed hidding bugs in those
 testcases, but there might be one other issue, so to be conservative, I
 temporarily reverted this patch.

If you are talking about:
gcc.target/aarch64/aapcs64/func-ret-2.c execution
Etc.

These test cases are too dependent on the original register allocation order 
and really can be safely ignored. Really these three tests should be moved or 
written in a more sane way. 

Thanks,
Andrew

 
 
 Thanks,
 Andrew
 
 
 
 -- 
 Regards,
 Jiong
 


Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2015-07-25 Thread Andrew Pinski
On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang jiong.w...@arm.com wrote:

 James Greenhalgh writes:

 On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
 Current IRA still use both target macros in a few places.

 Tell IRA to use the order we defined rather than with it's own cost
 calculation. Allocate caller saved first, then callee saved.

 This is especially useful for LR/x30, as it's free to allocate and is
 pure caller saved when used in leaf function.

 Haven't noticed significant impact on benchmarks, but by grepping some
 keywords like Spilling, Push.*spill etc in ira rtl dump, the number
 is smaller.

 OK for trunk?

 OK, sorry for the delay.

 It might be mail client mangling, but please check that the trailing slashes
 line up in the version that gets committed.

 Thanks,
 James

 2015-05-19  Jiong. Wang  jiong.w...@arm.com

 gcc/
   PR 63521
   * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
   (HONOR_REG_ALLOC_ORDER): Define.

 Patch reverted.

I did not see a reason why this patch was reverted.  Maybe I am
missing an email or something.

Thanks,
Andrew