Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"

2015-12-03 Thread Christophe Lyon
On 27 November 2015 at 12:26, Ramana Radhakrishnan
 wrote:
>
>
> On 27/11/15 09:40, Renlin Li wrote:
>> Hi Ramana,
>>
>> On 16/10/15 14:54, Renlin Li wrote:
>>>
>>>
 The command line implies we remove r7 (frame pointer in Thumb2 - 
 historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 
 (-mpic-register) which
 leaves us with:

 * r0, r1
 * r2, r3
 * r4, r5

 as the only free registers available for DImode values for the whole 
 compilation.

 We then have r0, r1 and r2 live across the insn which means that there are 
 no free registers to handle DImode values
 under the constraints provided unless LRA / reload can spill the argument 
 registers which it doesn't seem to be able to do
 in this particular testcase. Vlad, is that correct ?
>>> According to the logic, conflict hard register are excluded from spill 
>>> candidate. That's why, in this case, r0, r1, r2 cannot be used.
>>
>>
>> In the test case, there are code structure like this.
>>
>>
>> uint64_t callee (int a, int b, int c, int d);
>> uint64_t caller (int a, int b, int c, int d)
>> {
>>   uint64_t res;
>> /*
>> single BB contains complicated data processing which requires register pair
>> */
>>
>>   res = callee (tmp, b ,c, d);
>>   return res;
>> }
>>
>> CES pass in this case will extend the hard register live range across the 
>> whole BB until the callee. In this case, r1, r2, r3 are excluded from 
>> allocatable registers.
>>
>> There are places in CES which prevents extending the hard register's live 
>> range, for example for hard register which fullfil 
>> small_register_classes_for_mode_p(), class_likely_spilled_p(). However, 
>> argument registers belong to neither of them.
>>
>> I tried to stop CES from extending argument registers live range. However, 
>> later, scheduler jumps in and re-orders the instruction to reduce the pseudo 
>> register pressure, which in effect extend the argument register live again.
>
> Thanks for digging further and trying to figure out what the solution was. I 
> can't think of a less risky fix than what you have proposed, thus Ok if no 
> regressions.
>
>

Hi,

I have noticed regressions after this commit to the 4.9 branch:
Passed now fails  [PASS => FAIL]:
  gcc.c-torture/compile/pr34856.c  -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions  (test for excess errors)
  gcc.c-torture/compile/pr34856.c  -O3 -fomit-frame-pointer
-funroll-loops  (test for excess errors)
Pass disappears   [PASS => ]:
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2 -flto
-fno-use-linker-plugin -flto-partition=none
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects
Fail appears  [ => FAIL]:
  gcc.c-torture/compile/pr34856.c  -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions  (internal compiler error)
  gcc.c-torture/compile/pr34856.c  -O3 -fomit-frame-pointer
-funroll-loops  (internal compiler error)
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2  (internal
compiler error)
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2 -flto
-fno-use-linker-plugin -flto-partition=none  (internal compiler error)
  gcc.c-torture/execute/scal-to-vec1.c compilation,  -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error)

See the red links in
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-4_9-branch/231177/report-build-info.html

Christophe.

> regards
> Ramana
>
>
>
>
>
>>
>> Regards,
>>
>> Renlin Li
>>
>>
>>


Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"

2015-11-27 Thread Ramana Radhakrishnan


On 27/11/15 09:40, Renlin Li wrote:
> Hi Ramana,
> 
> On 16/10/15 14:54, Renlin Li wrote:
>>
>>
>>> The command line implies we remove r7 (frame pointer in Thumb2 - historical 
>>> accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) 
>>> which
>>> leaves us with:
>>>
>>> * r0, r1
>>> * r2, r3
>>> * r4, r5
>>>
>>> as the only free registers available for DImode values for the whole 
>>> compilation.
>>>
>>> We then have r0, r1 and r2 live across the insn which means that there are 
>>> no free registers to handle DImode values
>>> under the constraints provided unless LRA / reload can spill the argument 
>>> registers which it doesn't seem to be able to do
>>> in this particular testcase. Vlad, is that correct ?
>> According to the logic, conflict hard register are excluded from spill 
>> candidate. That's why, in this case, r0, r1, r2 cannot be used.
> 
> 
> In the test case, there are code structure like this.
> 
> 
> uint64_t callee (int a, int b, int c, int d);
> uint64_t caller (int a, int b, int c, int d)
> {
>   uint64_t res;
> /*
> single BB contains complicated data processing which requires register pair
> */
> 
>   res = callee (tmp, b ,c, d);
>   return res;
> }
> 
> CES pass in this case will extend the hard register live range across the 
> whole BB until the callee. In this case, r1, r2, r3 are excluded from 
> allocatable registers.
> 
> There are places in CES which prevents extending the hard register's live 
> range, for example for hard register which fullfil 
> small_register_classes_for_mode_p(), class_likely_spilled_p(). However, 
> argument registers belong to neither of them.
> 
> I tried to stop CES from extending argument registers live range. However, 
> later, scheduler jumps in and re-orders the instruction to reduce the pseudo 
> register pressure, which in effect extend the argument register live again.

Thanks for digging further and trying to figure out what the solution was. I 
can't think of a less risky fix than what you have proposed, thus Ok if no 
regressions.


regards
Ramana





> 
> Regards,
> 
> Renlin Li
> 
> 
> 


Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"

2015-11-27 Thread Renlin Li

Hi Ramana,

On 16/10/15 14:54, Renlin Li wrote:



The command line implies we remove r7 (frame pointer in Thumb2 - 
historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 
(-mpic-register) which

leaves us with:

* r0, r1
* r2, r3
* r4, r5

as the only free registers available for DImode values for the whole 
compilation.


We then have r0, r1 and r2 live across the insn which means that 
there are no free registers to handle DImode values
under the constraints provided unless LRA / reload can spill the 
argument registers which it doesn't seem to be able to do

in this particular testcase. Vlad, is that correct ?
According to the logic, conflict hard register are excluded from spill 
candidate. That's why, in this case, r0, r1, r2 cannot be used.



In the test case, there are code structure like this.


uint64_t callee (int a, int b, int c, int d);
uint64_t caller (int a, int b, int c, int d)
{
  uint64_t res;
/*
single BB contains complicated data processing which requires register pair
*/

  res = callee (tmp, b ,c, d);
  return res;
}

CES pass in this case will extend the hard register live range across 
the whole BB until the callee. In this case, r1, r2, r3 are excluded 
from allocatable registers.


There are places in CES which prevents extending the hard register's 
live range, for example for hard register which fullfil 
small_register_classes_for_mode_p(), class_likely_spilled_p(). However, 
argument registers belong to neither of them.


I tried to stop CES from extending argument registers live range. 
However, later, scheduler jumps in and re-orders the instruction to 
reduce the pseudo register pressure, which in effect extend the argument 
register live again.


Regards,

Renlin Li





Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"

2015-10-16 Thread Ramana Radhakrishnan
On Thu, Oct 15, 2015 at 03:01:24PM +0100, Renlin Li wrote:
> Hi all,
> 
> This is a backport patch to loosen restrictions on core registers
> for DImode values in Thumb2.
> 
> It fixes PR67383. In this particular case, reload tries to spill a
> hard register, and use next register together as a pair to reload a
> DImode pseudo register. However, the spilled register number is
> odd.This is rejected by arm_hard_regno_mode_ok(). There is no other
> register available, so the compiler throws an ICE.

I was not convinced enough by the reasoning provided in the description
because this patch was intended to be a bit of an optimization
rather than a correctness fix.

The command line implies we remove r7 (frame pointer in Thumb2 - historical 
accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) which
leaves us with:

* r0, r1
* r2, r3
* r4, r5

as the only free registers available for DImode values for the whole 
compilation.

We then have r0, r1 and r2 live across the insn which means that there are no 
free registers to handle DImode values
under the constraints provided unless LRA / reload can spill the argument 
registers which it doesn't seem to be able to do
in this particular testcase. Vlad, is that correct ?

Then I wondered why the same problem did not occur in ARM state given that has 
the same restriction.
In ARM state life is a bit better because the Frame pointer is r11 which means 
you pretty much have r6 and r7
as well available in addition to the above list, which means that theoretically 
you can
get away with this in ARM state.

Can you do some more comparison with ARM state as to why we don't have the same 
issue there ?


> 
> The test case in PR67383 is too big, so I didn't include it as part
> of the patch.

I've put up a reduced testcase on the bz, the one I was using to debug.

> arm-none-eabi regression test Okay without any new issues. Okay to
> backport to 4.9?

For changes of this nature please bootstrap and regression test this in arm and 
thumb2 state as well please.

regards
Ramana


Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"

2015-10-16 Thread Renlin Li

Hi Ramana,

On 16/10/15 11:52, Ramana Radhakrishnan wrote:

On Thu, Oct 15, 2015 at 03:01:24PM +0100, Renlin Li wrote:

Hi all,

This is a backport patch to loosen restrictions on core registers
for DImode values in Thumb2.

It fixes PR67383. In this particular case, reload tries to spill a
hard register, and use next register together as a pair to reload a
DImode pseudo register. However, the spilled register number is
odd.This is rejected by arm_hard_regno_mode_ok(). There is no other
register available, so the compiler throws an ICE.

I was not convinced enough by the reasoning provided in the description
because this patch was intended to be a bit of an optimization
rather than a correctness fix.


True, It's not a fix. It just allows more flexibility for register 
allocation.



The command line implies we remove r7 (frame pointer in Thumb2 - historical 
accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) which
leaves us with:

* r0, r1
* r2, r3
* r4, r5

as the only free registers available for DImode values for the whole 
compilation.

We then have r0, r1 and r2 live across the insn which means that there are no 
free registers to handle DImode values
under the constraints provided unless LRA / reload can spill the argument 
registers which it doesn't seem to be able to do
in this particular testcase. Vlad, is that correct ?
According to the logic, conflict hard register are excluded from spill 
candidate. That's why, in this case, r0, r1, r2 cannot be used.



Then I wondered why the same problem did not occur in ARM state given that has 
the same restriction.
In ARM state life is a bit better because the Frame pointer is r11 which means 
you pretty much have r6 and r7
as well available in addition to the above list, which means that theoretically 
you can
get away with this in ARM state.

Can you do some more comparison with ARM state as to why we don't have the same 
issue there ?


Presumably, ARM state should suffer from the same issue. I will have a look.

Regards,
Renlin

The test case in PR67383 is too big, so I didn't include it as part
of the patch.

I've put up a reduced testcase on the bz, the one I was using to debug.


arm-none-eabi regression test Okay without any new issues. Okay to
backport to 4.9?

For changes of this nature please bootstrap and regression test this in arm and 
thumb2 state as well please.

regards
Ramana





[PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"

2015-10-15 Thread Renlin Li

Hi all,

This is a backport patch to loosen restrictions on core registers for 
DImode values in Thumb2.


It fixes PR67383. In this particular case, reload tries to spill a hard 
register, and use next register together as a pair to reload a DImode 
pseudo register. However, the spilled register number is odd.This is 
rejected by arm_hard_regno_mode_ok(). There is no other register 
available, so the compiler throws an ICE.



The test case in PR67383 is too big, so I didn't include it as part of 
the patch.
arm-none-eabi regression test Okay without any new issues. Okay to 
backport to 4.9?


Regards,
Renlin Li

gcc/ChangeLog:

2015-10-15  Renlin Li  

PR target/67383
Backport from mainline.
2014-04-22  Ramana Radhakrishnan 

* config/arm/arm.c (arm_hard_regno_mode_ok): Loosen
restrictions on core registers for DImode values in Thumb2.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 08b5255..88d957a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22646,12 +22646,19 @@ arm_hard_regno_mode_ok (unsigned int regno, enum machine_mode mode)
 }
 
   /* We allow almost any value to be stored in the general registers.
- Restrict doubleword quantities to even register pairs so that we can
- use ldrd.  Do not allow very large Neon structure opaque modes in
- general registers; they would use too many.  */
+ Restrict doubleword quantities to even register pairs in ARM state
+ so that we can use ldrd.  Do not allow very large Neon structure
+ opaque modes in general registers; they would use too many.  */
   if (regno <= LAST_ARM_REGNUM)
-return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) != 0)
-  && ARM_NUM_REGS (mode) <= 4;
+{
+  if (ARM_NUM_REGS (mode) > 4)
+	  return FALSE;
+
+  if (TARGET_THUMB2)
+	return TRUE;
+
+  return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) != 0);
+}
 
   if (regno == FRAME_POINTER_REGNUM
   || regno == ARG_POINTER_REGNUM)