Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"
On 27 November 2015 at 12:26, Ramana Radhakrishnanwrote: > > > On 27/11/15 09:40, Renlin Li wrote: >> Hi Ramana, >> >> On 16/10/15 14:54, Renlin Li wrote: >>> >>> The command line implies we remove r7 (frame pointer in Thumb2 - historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) which leaves us with: * r0, r1 * r2, r3 * r4, r5 as the only free registers available for DImode values for the whole compilation. We then have r0, r1 and r2 live across the insn which means that there are no free registers to handle DImode values under the constraints provided unless LRA / reload can spill the argument registers which it doesn't seem to be able to do in this particular testcase. Vlad, is that correct ? >>> According to the logic, conflict hard register are excluded from spill >>> candidate. That's why, in this case, r0, r1, r2 cannot be used. >> >> >> In the test case, there are code structure like this. >> >> >> uint64_t callee (int a, int b, int c, int d); >> uint64_t caller (int a, int b, int c, int d) >> { >> uint64_t res; >> /* >> single BB contains complicated data processing which requires register pair >> */ >> >> res = callee (tmp, b ,c, d); >> return res; >> } >> >> CES pass in this case will extend the hard register live range across the >> whole BB until the callee. In this case, r1, r2, r3 are excluded from >> allocatable registers. >> >> There are places in CES which prevents extending the hard register's live >> range, for example for hard register which fullfil >> small_register_classes_for_mode_p(), class_likely_spilled_p(). However, >> argument registers belong to neither of them. >> >> I tried to stop CES from extending argument registers live range. However, >> later, scheduler jumps in and re-orders the instruction to reduce the pseudo >> register pressure, which in effect extend the argument register live again. > > Thanks for digging further and trying to figure out what the solution was. I > can't think of a less risky fix than what you have proposed, thus Ok if no > regressions. > > Hi, I have noticed regressions after this commit to the 4.9 branch: Passed now fails [PASS => FAIL]: gcc.c-torture/compile/pr34856.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (test for excess errors) gcc.c-torture/compile/pr34856.c -O3 -fomit-frame-pointer -funroll-loops (test for excess errors) Pass disappears [PASS => ]: gcc.c-torture/execute/scal-to-vec1.c compilation, -O2 gcc.c-torture/execute/scal-to-vec1.c compilation, -O2 -flto -fno-use-linker-plugin -flto-partition=none gcc.c-torture/execute/scal-to-vec1.c compilation, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects Fail appears [ => FAIL]: gcc.c-torture/compile/pr34856.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (internal compiler error) gcc.c-torture/compile/pr34856.c -O3 -fomit-frame-pointer -funroll-loops (internal compiler error) gcc.c-torture/execute/scal-to-vec1.c compilation, -O2 (internal compiler error) gcc.c-torture/execute/scal-to-vec1.c compilation, -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error) gcc.c-torture/execute/scal-to-vec1.c compilation, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error) See the red links in http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-4_9-branch/231177/report-build-info.html Christophe. > regards > Ramana > > > > > >> >> Regards, >> >> Renlin Li >> >> >>
Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"
On 27/11/15 09:40, Renlin Li wrote: > Hi Ramana, > > On 16/10/15 14:54, Renlin Li wrote: >> >> >>> The command line implies we remove r7 (frame pointer in Thumb2 - historical >>> accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) >>> which >>> leaves us with: >>> >>> * r0, r1 >>> * r2, r3 >>> * r4, r5 >>> >>> as the only free registers available for DImode values for the whole >>> compilation. >>> >>> We then have r0, r1 and r2 live across the insn which means that there are >>> no free registers to handle DImode values >>> under the constraints provided unless LRA / reload can spill the argument >>> registers which it doesn't seem to be able to do >>> in this particular testcase. Vlad, is that correct ? >> According to the logic, conflict hard register are excluded from spill >> candidate. That's why, in this case, r0, r1, r2 cannot be used. > > > In the test case, there are code structure like this. > > > uint64_t callee (int a, int b, int c, int d); > uint64_t caller (int a, int b, int c, int d) > { > uint64_t res; > /* > single BB contains complicated data processing which requires register pair > */ > > res = callee (tmp, b ,c, d); > return res; > } > > CES pass in this case will extend the hard register live range across the > whole BB until the callee. In this case, r1, r2, r3 are excluded from > allocatable registers. > > There are places in CES which prevents extending the hard register's live > range, for example for hard register which fullfil > small_register_classes_for_mode_p(), class_likely_spilled_p(). However, > argument registers belong to neither of them. > > I tried to stop CES from extending argument registers live range. However, > later, scheduler jumps in and re-orders the instruction to reduce the pseudo > register pressure, which in effect extend the argument register live again. Thanks for digging further and trying to figure out what the solution was. I can't think of a less risky fix than what you have proposed, thus Ok if no regressions. regards Ramana > > Regards, > > Renlin Li > > >
Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"
Hi Ramana, On 16/10/15 14:54, Renlin Li wrote: The command line implies we remove r7 (frame pointer in Thumb2 - historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) which leaves us with: * r0, r1 * r2, r3 * r4, r5 as the only free registers available for DImode values for the whole compilation. We then have r0, r1 and r2 live across the insn which means that there are no free registers to handle DImode values under the constraints provided unless LRA / reload can spill the argument registers which it doesn't seem to be able to do in this particular testcase. Vlad, is that correct ? According to the logic, conflict hard register are excluded from spill candidate. That's why, in this case, r0, r1, r2 cannot be used. In the test case, there are code structure like this. uint64_t callee (int a, int b, int c, int d); uint64_t caller (int a, int b, int c, int d) { uint64_t res; /* single BB contains complicated data processing which requires register pair */ res = callee (tmp, b ,c, d); return res; } CES pass in this case will extend the hard register live range across the whole BB until the callee. In this case, r1, r2, r3 are excluded from allocatable registers. There are places in CES which prevents extending the hard register's live range, for example for hard register which fullfil small_register_classes_for_mode_p(), class_likely_spilled_p(). However, argument registers belong to neither of them. I tried to stop CES from extending argument registers live range. However, later, scheduler jumps in and re-orders the instruction to reduce the pseudo register pressure, which in effect extend the argument register live again. Regards, Renlin Li
Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"
On Thu, Oct 15, 2015 at 03:01:24PM +0100, Renlin Li wrote: > Hi all, > > This is a backport patch to loosen restrictions on core registers > for DImode values in Thumb2. > > It fixes PR67383. In this particular case, reload tries to spill a > hard register, and use next register together as a pair to reload a > DImode pseudo register. However, the spilled register number is > odd.This is rejected by arm_hard_regno_mode_ok(). There is no other > register available, so the compiler throws an ICE. I was not convinced enough by the reasoning provided in the description because this patch was intended to be a bit of an optimization rather than a correctness fix. The command line implies we remove r7 (frame pointer in Thumb2 - historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) which leaves us with: * r0, r1 * r2, r3 * r4, r5 as the only free registers available for DImode values for the whole compilation. We then have r0, r1 and r2 live across the insn which means that there are no free registers to handle DImode values under the constraints provided unless LRA / reload can spill the argument registers which it doesn't seem to be able to do in this particular testcase. Vlad, is that correct ? Then I wondered why the same problem did not occur in ARM state given that has the same restriction. In ARM state life is a bit better because the Frame pointer is r11 which means you pretty much have r6 and r7 as well available in addition to the above list, which means that theoretically you can get away with this in ARM state. Can you do some more comparison with ARM state as to why we don't have the same issue there ? > > The test case in PR67383 is too big, so I didn't include it as part > of the patch. I've put up a reduced testcase on the bz, the one I was using to debug. > arm-none-eabi regression test Okay without any new issues. Okay to > backport to 4.9? For changes of this nature please bootstrap and regression test this in arm and thumb2 state as well please. regards Ramana
Re: [PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"
Hi Ramana, On 16/10/15 11:52, Ramana Radhakrishnan wrote: On Thu, Oct 15, 2015 at 03:01:24PM +0100, Renlin Li wrote: Hi all, This is a backport patch to loosen restrictions on core registers for DImode values in Thumb2. It fixes PR67383. In this particular case, reload tries to spill a hard register, and use next register together as a pair to reload a DImode pseudo register. However, the spilled register number is odd.This is rejected by arm_hard_regno_mode_ok(). There is no other register available, so the compiler throws an ICE. I was not convinced enough by the reasoning provided in the description because this patch was intended to be a bit of an optimization rather than a correctness fix. True, It's not a fix. It just allows more flexibility for register allocation. The command line implies we remove r7 (frame pointer in Thumb2 - historical accident, fno-omit-frame-pointer), r9 (ffixed-r9), r10 (-mpic-register) which leaves us with: * r0, r1 * r2, r3 * r4, r5 as the only free registers available for DImode values for the whole compilation. We then have r0, r1 and r2 live across the insn which means that there are no free registers to handle DImode values under the constraints provided unless LRA / reload can spill the argument registers which it doesn't seem to be able to do in this particular testcase. Vlad, is that correct ? According to the logic, conflict hard register are excluded from spill candidate. That's why, in this case, r0, r1, r2 cannot be used. Then I wondered why the same problem did not occur in ARM state given that has the same restriction. In ARM state life is a bit better because the Frame pointer is r11 which means you pretty much have r6 and r7 as well available in addition to the above list, which means that theoretically you can get away with this in ARM state. Can you do some more comparison with ARM state as to why we don't have the same issue there ? Presumably, ARM state should suffer from the same issue. I will have a look. Regards, Renlin The test case in PR67383 is too big, so I didn't include it as part of the patch. I've put up a reduced testcase on the bz, the one I was using to debug. arm-none-eabi regression test Okay without any new issues. Okay to backport to 4.9? For changes of this nature please bootstrap and regression test this in arm and thumb2 state as well please. regards Ramana
[PR67383][ARM][4.9]Backport of "Allow any register for DImode values in Thumb2"
Hi all, This is a backport patch to loosen restrictions on core registers for DImode values in Thumb2. It fixes PR67383. In this particular case, reload tries to spill a hard register, and use next register together as a pair to reload a DImode pseudo register. However, the spilled register number is odd.This is rejected by arm_hard_regno_mode_ok(). There is no other register available, so the compiler throws an ICE. The test case in PR67383 is too big, so I didn't include it as part of the patch. arm-none-eabi regression test Okay without any new issues. Okay to backport to 4.9? Regards, Renlin Li gcc/ChangeLog: 2015-10-15 Renlin LiPR target/67383 Backport from mainline. 2014-04-22 Ramana Radhakrishnan * config/arm/arm.c (arm_hard_regno_mode_ok): Loosen restrictions on core registers for DImode values in Thumb2. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 08b5255..88d957a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -22646,12 +22646,19 @@ arm_hard_regno_mode_ok (unsigned int regno, enum machine_mode mode) } /* We allow almost any value to be stored in the general registers. - Restrict doubleword quantities to even register pairs so that we can - use ldrd. Do not allow very large Neon structure opaque modes in - general registers; they would use too many. */ + Restrict doubleword quantities to even register pairs in ARM state + so that we can use ldrd. Do not allow very large Neon structure + opaque modes in general registers; they would use too many. */ if (regno <= LAST_ARM_REGNUM) -return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) != 0) - && ARM_NUM_REGS (mode) <= 4; +{ + if (ARM_NUM_REGS (mode) > 4) + return FALSE; + + if (TARGET_THUMB2) + return TRUE; + + return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) != 0); +} if (regno == FRAME_POINTER_REGNUM || regno == ARG_POINTER_REGNUM)