[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Jakub Jelinek changed: What|Removed |Added Priority|P1 |P2 --- Comment #60 from Jakub Jelinek --- This PR had various fixes applied already and the remaining issues don't warrant a release blocker, so downgrading this to P2.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #59 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #58) > If we don't want to go with #c35 at least for GCC 9, would the #c44 patch be > still useful without it (does it ever trigger say on the kernel where it > didn't trigger before)? The patch in comment 44 is obviously good, it improves the size by 0.090% as noted (this is a kernel build, multi_v5_defconfig iirc). I'd say it is perfectly safe for GCC 9, but I'm not an Arm maintainer :-)
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #58 from Jakub Jelinek --- If we don't want to go with #c35 at least for GCC 9, would the #c44 patch be still useful without it (does it ever trigger say on the kernel where it didn't trigger before)?
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Jeffrey A. Law changed: What|Removed |Added CC||law at redhat dot com --- Comment #57 from Jeffrey A. Law --- So what's actually left to do with this BZ? ie, what tests are still regressing?
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Peter Bergner changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|bergner at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #56 from Peter Bergner --- I committed the RA fix. Unassigning myself now.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #55 from Peter Bergner --- Author: bergner Date: Thu Apr 18 22:14:17 2019 New Revision: 270448 URL: https://gcc.gnu.org/viewcvs?rev=270448=gcc=rev Log: PR rtl-optimization/87871 * ira-lives.c (make_object_dead): Don't add conflicts to TOTAL_CONFLICT_HARD_REGS for register ignore_reg_for_conflicts. Modified: trunk/gcc/ChangeLog trunk/gcc/ira-lives.c
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #54 from Segher Boessenkool --- (In reply to Wilco from comment #52) > (In reply to Segher Boessenkool from comment #48) > > With just Peter's and Jakub's patch, it *improves* code size by 0.090%. > > That does not fix this PR though :-/ > > But it does fix most of the codesize regression. Yes, and it often creates *better* code, as far as I can see. > The shrinkwrapping testcase > seems a preexisting problem that was exposed by the combine changes, It is. > so it > doesn't need to hold up the release. The regalloc change might fix > addr-modes-float.c too. I'd like to see the RA fix in GCC 9.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #53 from Segher Boessenkool --- (In reply to Richard Earnshaw from comment #51) > In the more general case splitting this would produce worse code, not > better, since then we'd end up with two instructions rather than one. Sure, it _often_ is good to have it merged. Quite clearly more often than not it's good, so if you have to pick only one way, this is the way to go. Hopefully we can do better though. But not for stage 4, sure.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #52 from Wilco --- (In reply to Segher Boessenkool from comment #48) > With just Peter's and Jakub's patch, it *improves* code size by 0.090%. > That does not fix this PR though :-/ But it does fix most of the codesize regression. The shrinkwrapping testcase seems a preexisting problem that was exposed by the combine changes, so it doesn't need to hold up the release. The regalloc change might fix addr-modes-float.c too.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #51 from Richard Earnshaw --- (In reply to Segher Boessenkool from comment #50) > The insn is > > (insn 7 3 8 2 (parallel [ > (set (reg:CC 100 cc) > (compare:CC (reg:SI 0 r0 [116]) > (const_int 0 [0]))) > (set (reg/v:SI 4 r4 [orig:112 a ] [112]) > (reg:SI 0 r0 [116])) > ]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0} > (nil)) > > and that isn't split, and then prepare_shrink_wrap gives up on it. In the more general case splitting this would produce worse code, not better, since then we'd end up with two instructions rather than one.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #50 from Segher Boessenkool --- The insn is (insn 7 3 8 2 (parallel [ (set (reg:CC 100 cc) (compare:CC (reg:SI 0 r0 [116]) (const_int 0 [0]))) (set (reg/v:SI 4 r4 [orig:112 a ] [112]) (reg:SI 0 r0 [116])) ]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0} (nil)) and that isn't split, and then prepare_shrink_wrap gives up on it.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #49 from Segher Boessenkool --- (In reply to Wilco from comment #47) > (In reply to Segher Boessenkool from comment #46) > > With all three patches together (Peter's, mine, Jakub's), I get a code size > > increase of only 0.047%, much more acceptable. Now looking what that diff > > really *is* :-) > > I think with Jakub's change you don't need to disable the movsi_compare0 > pattern in combine. If regalloc works as expected, it will get split into a > compare so shrinkwrap can handle it. prepare_shrink_wrap can not handle that. prepare_shrink_wrap needs to be improved for other reasons, of course.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #48 from Segher Boessenkool --- With just Peter's and Jakub's patch, it *improves* code size by 0.090%. That does not fix this PR though :-/
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #47 from Wilco --- (In reply to Segher Boessenkool from comment #46) > With all three patches together (Peter's, mine, Jakub's), I get a code size > increase of only 0.047%, much more acceptable. Now looking what that diff > really *is* :-) I think with Jakub's change you don't need to disable the movsi_compare0 pattern in combine. If regalloc works as expected, it will get split into a compare so shrinkwrap can handle it.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #46 from Segher Boessenkool --- With all three patches together (Peter's, mine, Jakub's), I get a code size increase of only 0.047%, much more acceptable. Now looking what that diff really *is* :-)
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Peter Bergner changed: What|Removed |Added URL||https://gcc.gnu.org/ml/gcc- ||patches/2019-04/msg00768.ht ||ml --- Comment #45 from Peter Bergner --- I submitted a patch to fix the IRA conflict issue.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #44 from Jakub Jelinek --- Well, it requires that the RA looks specially for this kind of pattern and if it ends up being a noop move, nothing simplifies the pattern again back to normal comparison, and as Segher noted, it can negatively affect other optimization passes. Completely untested patch peephole2 patch: --- gcc/config/arm/arm.md.jj2019-03-19 11:04:49.283170205 +0100 +++ gcc/config/arm/arm.md 2019-04-18 16:21:18.974543408 +0200 @@ -10928,12 +10928,22 @@ [(set (match_operand:SI 0 "arm_general_register_operand" "") (match_operand:SI 1 "arm_general_register_operand" "")) (set (reg:CC CC_REGNUM) - (compare:CC (match_dup 1) (const_int 0)))] + (compare:CC (match_operand:SI 2 "arm_general_register_operand" "") + (const_int 0)))] + "TARGET_ARM + && (rtx_equal_p (operands[2], operands[0]) + || rtx_equal_p (operands[2], operands[1]))" + [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 1) (const_int 0))) + (set (match_dup 0) (match_dup 1))])]) + +(define_peephole2 + [(set (reg:CC CC_REGNUM) + (compare:CC (match_operand:SI 1 "arm_general_register_operand" "") + (const_int 0)))] + (set (match_operand:SI 0 "arm_general_register_operand" "") (match_dup 1))] "TARGET_ARM" [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 1) (const_int 0))) - (set (match_dup 0) (match_dup 1))])] - "" -) + (set (match_dup 0) (match_dup 1))])]) (define_split [(set (match_operand:SI 0 "s_register_operand" "")
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #43 from Peter Bergner --- (In reply to Jakub Jelinek from comment #40) > The question is what the code size differences would be with those changes > (i.e. how often does it help not to have *movsi_compare0 make RA decisions > worse vs. how often we actually have those two instructions separated by > other insns). How does *movsi_compare0 make RA decisions worse other than the issue of p116 not being assigned r0 above, which my patch attached above fixes?
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #42 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #40) > The question is what the code size differences would be with those changes > (i.e. how often does it help not to have *movsi_compare0 make RA decisions > worse vs. how often we actually have those two instructions separated by > other insns). Yeah. If someone writes patches adding the peepholes, I can test it, but I'm no hero at writing peepholes, esp. for an arch I do not fully understand :-/
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #41 from Segher Boessenkool --- (In reply to Wilco from comment #38) > Well the question really is what is bad about movsi_compare0 that could be > easily fixed? "Easily fixed"... There is no such thing here. Because it is a parallel everything has to work on the compare and the move together. Various things do not handle that, things that only handle simple moves for example. Like prepare_shrink_wrap in this testcase. And for many other things you have to split the parallel before you can do the transform you want.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #40 from Jakub Jelinek --- (In reply to Segher Boessenkool from comment #39) > On a linux kernel defconfig build it increases code size by 0.567%. > That seems a bit much :-( > > The peephole only recognises > > mov rA,rB > cmp rB,#0 > > and not > > mov rA,rB > cmp rA,#0 Well, changing the peephole2 so that it handles both of the above at the same time shall be quite easy. > > or > > cmp rB,#0 > mov rA,rB And adding a peephole for this case too. The question is what the code size differences would be with those changes (i.e. how often does it help not to have *movsi_compare0 make RA decisions worse vs. how often we actually have those two instructions separated by other insns).
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #39 from Segher Boessenkool --- On a linux kernel defconfig build it increases code size by 0.567%. That seems a bit much :-( The peephole only recognises mov rA,rB cmp rB,#0 and not mov rA,rB cmp rA,#0 or cmp rB,#0 mov rA,rB and we see a lot of the latter, after my patch anyway.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #38 from Wilco --- (In reply to Segher Boessenkool from comment #37) > Yes, it is a balancing act. Which option works better? Well the question really is what is bad about movsi_compare0 that could be easily fixed? The move is for free so there is no need for the "r,0" variant in principle, so if that helps reducing constraints on register allocation then we could remove or reorder that alternative.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #37 from Segher Boessenkool --- Yes, it is a balancing act. Which option works better?
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #36 from Richard Earnshaw --- (In reply to Segher Boessenkool from comment #35) > Peter's patch solves this particular problem, but not the PR unfortunately. > > I finally understand Jakub's comment 30. This patch solves the PR (also > without Peter's patch): > > === > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > index 0aecd03..67dddb2 100644 > --- a/gcc/config/arm/arm.md > +++ b/gcc/config/arm/arm.md > @@ -6340,7 +6340,7 @@ (define_insn "*movsi_compare0" > (const_int 0))) > (set (match_operand:SI 0 "s_register_operand" "=r,r") > (match_dup 1))] > - "TARGET_32BIT" > + "TARGET_32BIT && reload_completed" >"@ > cmp%?\\t%0, #0 > subs%?\\t%0, %1, #0" > === And what about all the cases where the move and compare are not adjacent in the instruction stream so don't get matched by peepholing?
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #35 from Segher Boessenkool --- Peter's patch solves this particular problem, but not the PR unfortunately. I finally understand Jakub's comment 30. This patch solves the PR (also without Peter's patch): === diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0aecd03..67dddb2 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -6340,7 +6340,7 @@ (define_insn "*movsi_compare0" (const_int 0))) (set (match_operand:SI 0 "s_register_operand" "=r,r") (match_dup 1))] - "TARGET_32BIT" + "TARGET_32BIT && reload_completed" "@ cmp%?\\t%0, #0 subs%?\\t%0, %1, #0" ===
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Peter Bergner changed: What|Removed |Added Attachment #46189|0 |1 is obsolete|| --- Comment #34 from Peter Bergner --- Created attachment 46190 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46190=edit Updated patch Updated patch that is functionally the same, but I like this one better.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Peter Bergner changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |bergner at gcc dot gnu.org --- Comment #33 from Peter Bergner --- Created attachment 46189 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46189=edit Proposed patch Here is a patch that fixes make_object_dead() that was causing r0 to be incorrectly added to p116's total_conflict_regs which made it impossible to assign r0 to p116. With this patch, we now assign r0 to p116 like we want: ;; a5(r116,l0) conflicts: ;; total conflict hard regs: ;; conflict hard regs: ... Popping a5(r116,l0) -- assign reg 0 Popping a3(r112,l0) -- assign reg 4 Popping a2(r114,l0) -- assign reg 4 Popping a0(r111,l0) -- assign reg 0 Popping a4(r117,l0) -- assign reg 0 Popping a1(r113,l0) -- assign reg 3 Disposition: 0:r111 l0 03:r112 l0 41:r113 l0 32:r114 l0 4 5:r116 l0 04:r117 l0 0 Can someone on the ARM side please bootstrap and regtest the patch to see if it fixes the testsuite fallout? I'll bootstrap and regtest it on power.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #32 from Peter Bergner --- (In reply to Peter Bergner from comment #26) > (In reply to Vladimir Makarov from comment #25) > > (In reply to Peter Bergner from comment #24) > >> I don't know why r0 isn't in profitable_regs for pseudo 116. > > > > Profitable regs there contain also conflict regs. R0 is conflicting with > > p106. If R0 usage (in call insn) were in the same BB, your new conflict > > calculation found that there is no actual conflict. But IRA uses > > df-infrastructure which tells IRA that R0 lives at the BB end where p106 > > occurs. > > I'm sorry, but I don't see where p116 conflicts with r0. Can you show me > where/how? Looking at my IRA dump, I see: Ok, so there is a bug in print_allocno_conflicts() that causes us to skip printing the hard reg conflicts if the allocno doesn't have any conflicts with other allocnos. I submitted a patch to fix that. With the fix, I know see the following conflict info for p116: ;; a5(r116,l0) conflicts: ;; total conflict hard regs: 0 ;; conflict hard regs: So this explains why p116 isn't assigned r0. That doesn't explain why p116 conflicts with r0 though, because looking at the rtl brlow, it shouldn't: (insn 50 3 7 2 (set (reg:SI 116) (reg:SI 0 r0 [ aD.4197 ])) "bug.i":7:1 181 {*arm_movsi_insn} (nil)) (insn 7 50 8 2 (parallel [ (set (reg:CC 100 cc) (compare:CC (reg:SI 116) (const_int 0 [0]))) (set (reg/v:SI 112 [ aD.4197 ]) (reg:SI 116)) ]) "bug.i":10:6 188 {*movsi_compare0} (expr_list:REG_DEAD (reg:SI 116) (nil))) So yes, r0 is live at the definition of p116, we know they have the same value. My ira-conflicts.c changes adding non_conflicting_reg_copy_p() should have handled that, but it isn't. Now non_conflicting_reg_copy_p() does correctly notice that insn 50 is a simple copy that we can ignore for conflict purposes, but somehow, a conflict is still being added. I tracked the problem down to ira-conflicts.c:make_object_dead() not handling ignore_reg_for_conflicts correctly. The bug is that we correctly remove the ignored reg (r0) from OBJECT_CONFLICT_HARD_REGS, but we miss removing it from OBJECT_TOTAL_CONFLICT_HARD_REGS too. I'm working on a patch.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #31 from Segher Boessenkool --- It's how you do a parallel of a mov and a flags set, which of course you can have before RA, and you want created by combine, typically. Or do I misunderstand the question? (I though Arm have a "movs" op for this, btw?)
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #30 from Jakub Jelinek --- Is the *movsi_compare0 pattern actually ever a benefit before RA? At least in this case it clearly results in a worse generated code rather than better, and I bet in other cases too, it just ties the hands of the RA too much. I wonder if it better shouldn't be a pattern that is only matched when reload_completed and recognized say by a peephole2 or something similar.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #29 from Peter Bergner --- (In reply to Segher Boessenkool from comment #27) > > Note: I'm assuming we're missing a \n after p116's empty conflicts above? > > The code is Right. I already whipped up a patch that gives me: ;; a5(r116,l0) conflicts: ;; total conflict hard regs: ;; conflict hard regs: cp0:a0(r111)<->a4(r117)@330:move ...
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #28 from Peter Bergner --- Vlad, in looking at add_insn_allocno_copies(), it looks like it relies on seeing REG_DEAD notes on whether to record a copy/shuffle that should be handled. Shouldn't we instead be looking at whether the source and destination regs conflict or not? Ie, there might not be a REG_DEAD note, but that doesn't mean the two regs/pseudos conflict. And conversely, if there is a REG_DEAD note on the copy/shuffle, the two regs/pseudos still could conflict.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #27 from Segher Boessenkool --- (In reply to Peter Bergner from comment #26) > ;; a4(r117,l0) conflicts: a3(r112,l0) > ;; total conflict hard regs: > ;; conflict hard regs: > > ;; a5(r116,l0) conflicts: cp0:a0(r111)<->a4(r117)@330:move > cp1:a2(r114)<->a3(r112)@41:shuffle > cp2:a3(r112)<->a5(r116)@125:shuffle > pref0:a0(r111)<-hr0@2000 > pref1:a4(r117)<-hr0@660 > pref2:a5(r116)<-hr0@1000 > regions=1, blocks=6, points=10 > allocnos=6 (big 0), copies=3, conflicts=0, ranges=6 > > Note: I'm assuming we're missing a \n after p116's empty conflicts above? The code is fputs (" conflicts:", file); n = ALLOCNO_NUM_OBJECTS (a); for (i = 0; i < n; i++) { ira_object_t obj = ALLOCNO_OBJECT (a, i); ira_object_t conflict_obj; ira_object_conflict_iterator oci; if (OBJECT_CONFLICT_ARRAY (obj) == NULL) continue; [...] } and the ;; total conflict hard regs: etc. prints are in that [...].
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #26 from Peter Bergner --- (In reply to Vladimir Makarov from comment #25) > (In reply to Peter Bergner from comment #24) >> I don't know why r0 isn't in profitable_regs for pseudo 116. > > Profitable regs there contain also conflict regs. R0 is conflicting with > p106. If R0 usage (in call insn) were in the same BB, your new conflict > calculation found that there is no actual conflict. But IRA uses > df-infrastructure which tells IRA that R0 lives at the BB end where p106 > occurs. I'm sorry, but I don't see where p116 conflicts with r0. Can you show me where/how? Looking at my IRA dump, I see: +++Allocating 40 bytes for conflict table (uncompressed size 48) ;; a0(r111,l0) conflicts: a2(r114,l0) a1(r113,l0) a3(r112,l0) ;; total conflict hard regs: ;; conflict hard regs: ;; a1(r113,l0) conflicts: a0(r111,l0) a2(r114,l0) a3(r112,l0) ;; total conflict hard regs: ;; conflict hard regs: ;; a2(r114,l0) conflicts: a0(r111,l0) a1(r113,l0) ;; total conflict hard regs: ;; conflict hard regs: ;; a3(r112,l0) conflicts: a0(r111,l0) a1(r113,l0) a4(r117,l0) ;; total conflict hard regs: 0 12 14 ;; conflict hard regs: 0 12 14 ;; a4(r117,l0) conflicts: a3(r112,l0) ;; total conflict hard regs: ;; conflict hard regs: ;; a5(r116,l0) conflicts: cp0:a0(r111)<->a4(r117)@330:move cp1:a2(r114)<->a3(r112)@41:shuffle cp2:a3(r112)<->a5(r116)@125:shuffle pref0:a0(r111)<-hr0@2000 pref1:a4(r117)<-hr0@660 pref2:a5(r116)<-hr0@1000 regions=1, blocks=6, points=10 allocnos=6 (big 0), copies=3, conflicts=0, ranges=6 Note: I'm assuming we're missing a \n after p116's empty conflicts above? So I don't see p116 conflict with r0, but I do see we register a shuffle between p112 and p116 and p112 does (correctly) conflict with r0. Is it really the shuffle between p112 and p116 that is preventing us from putting r0 into p116's profitable regs in the hope the p112 and p116 may get assigned the same reg allowing the removal of the copy? If so, that shuffle, since it's attached to the setting of the CC reg cannot actually be removed even if p112 and p116 are assigned the same register. Should we just ignore those types of shuffles/copies that have other side effects?
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #25 from Vladimir Makarov --- (In reply to Peter Bergner from comment #24) > So improve_allocation() initially looks at using r0, but disregards it > because check_hard_reg_p() returns false for r0, and that is because we fail > this test: > > /* Checking only profitable hard regs. */ > if (! TEST_HARD_REG_BIT (profitable_regs, hard_regno)) > return false; > > I don't know why r0 isn't in profitable_regs for pseudo 116. Profitable regs there contain also conflict regs. R0 is conflicting with p106. If R0 usage (in call insn) were in the same BB, your new conflict calculation found that there is no actual conflict. But IRA uses df-infrastructure which tells IRA that R0 lives at the BB end where p106 occurs. So the right solution of the PR would be fixing df-infrastructure live analysis or may be somehow to ignore usage of r0 in call insn. That is how see the situation.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #24 from Peter Bergner --- So improve_allocation() initially looks at using r0, but disregards it because check_hard_reg_p() returns false for r0, and that is because we fail this test: /* Checking only profitable hard regs. */ if (! TEST_HARD_REG_BIT (profitable_regs, hard_regno)) return false; I don't know why r0 isn't in profitable_regs for pseudo 116.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #23 from Segher Boessenkool --- It says (I added some debug) Insn 50(l0): point = 27 ignoring for conflicts: (reg:SI 0 r0 [ a ]) but non_conflicting_reg_copy_p isn't called at all where it is improving the allocation
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #22 from Peter Bergner --- (In reply to Wilco from comment #21) > (In reply to Vladimir Makarov from comment #20) >> The question is why p116 conflicts with hr0. Before RA we have > > That's a bug since register copies should not create a conflict. It's one of > the most basic optimization of register allocator. > > And there is also the question why we do move r0 into a virtual register but > not assign the virtual register to an argument register. We don't since my patch adding that support in current trunk. That said, if non_conflicting_reg_copy_p() returns NULL_RTX for that r116=r0 copy insn, then they will conflict. So what does non_conflicting_reg_copy_p() return? ...and if it says they conflict, why? The insn has side effects or SImode is a register pair on arm or ???
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #21 from Wilco --- (In reply to Vladimir Makarov from comment #20) > (In reply to Wilco from comment #19) > > (In reply to Peter Bergner from comment #18) > > > (In reply to Segher Boessenkool from comment #15) > > > > Popping a5(r116,l0) -- assign reg 3 > > > > Popping a3(r112,l0) -- assign reg 4 > > > > Popping a2(r114,l0) -- assign reg 3 > > > > Popping a0(r111,l0) -- assign reg 0 > > > > Popping a4(r117,l0) -- assign reg 0 > > > > Popping a1(r113,l0) -- assign reg 2 > > > > Assigning 4 to a5r116 > > > > Disposition: > > > > 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 > > > >3 > > > > 5:r116 l0 44:r117 l0 0 > > > > > > > > > > > > r116 does not conflict with *any* other pseudo. It is alive in the > > > > first > > > > two insns of the function, which are > > > > > > So we initially assign r3 to r116 presumably because it has the same cost > > > as > > > the other gprs and it occurs first in REG_ALLOC_ORDER. Then > > > improve_allocation() decides that r4 is a better hard reg and switches the > > > assignment to that. I'm not sure why it wouldn't choose r0 there instead. > > > > I would expect that r116 has a strong preference for r0 given the r116 = mov > > r0 and thus allocating r116 to r0 should have the lowest cost by a large > > margin. > > p116 conflicts with hr0. Therefore it can not get hr0. p112 is connected > with p116. p112 got hr4 and p116 got 3. Assigning 4 to 116 is profitable. > Therefore assignment of p116 is changed to 4. > > The question is why p116 conflicts with hr0. Before RA we have That's a bug since register copies should not create a conflict. It's one of the most basic optimization of register allocator. And there is also the question why we do move r0 into a virtual register but not assign the virtual register to an argument register.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #20 from Vladimir Makarov --- (In reply to Wilco from comment #19) > (In reply to Peter Bergner from comment #18) > > (In reply to Segher Boessenkool from comment #15) > > > Popping a5(r116,l0) -- assign reg 3 > > > Popping a3(r112,l0) -- assign reg 4 > > > Popping a2(r114,l0) -- assign reg 3 > > > Popping a0(r111,l0) -- assign reg 0 > > > Popping a4(r117,l0) -- assign reg 0 > > > Popping a1(r113,l0) -- assign reg 2 > > > Assigning 4 to a5r116 > > > Disposition: > > > 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 > > > 3 > > > 5:r116 l0 44:r117 l0 0 > > > > > > > > > r116 does not conflict with *any* other pseudo. It is alive in the first > > > two insns of the function, which are > > > > So we initially assign r3 to r116 presumably because it has the same cost as > > the other gprs and it occurs first in REG_ALLOC_ORDER. Then > > improve_allocation() decides that r4 is a better hard reg and switches the > > assignment to that. I'm not sure why it wouldn't choose r0 there instead. > > I would expect that r116 has a strong preference for r0 given the r116 = mov > r0 and thus allocating r116 to r0 should have the lowest cost by a large > margin. p116 conflicts with hr0. Therefore it can not get hr0. p112 is connected with p116. p112 got hr4 and p116 got 3. Assigning 4 to 116 is profitable. Therefore assignment of p116 is changed to 4. The question is why p116 conflicts with hr0. Before RA we have (insn 50 3 7 2 (set (reg:SI 116) (reg:SI 0 r0 [ a ])) "/home/cygnus/vmakarov/build1/trunk/gcc/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c":11:1 181 {*arm_mo\ vsi_insn} (nil)) ---> No reg-dead r0! because later we have call_insn 11 9 51 3 (parallel [ (set (reg:SI 0 r0) (call (mem:SI (symbol_ref:SI ("foo") [flags 0x41] ) [0 foo S4 A32]) (const_int 0 [0]))) (use (const_int 0 [0])) (clobber (reg:SI 14 lr)) ]) "/home/cygnus/vmakarov/build1/trunk/gcc/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c":16:11 219 {*call_value_symbol} (expr_list:REG_CALL_DECL (symbol_ref:SI ("foo") [flags 0x41] ) (nil)) (expr_list (clobber (reg:SI 12 ip)) (expr_list:SI (use (reg:SI 0 r0)) (nil ---> use r0!
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #19 from Wilco --- (In reply to Peter Bergner from comment #18) > (In reply to Segher Boessenkool from comment #15) > > Popping a5(r116,l0) -- assign reg 3 > > Popping a3(r112,l0) -- assign reg 4 > > Popping a2(r114,l0) -- assign reg 3 > > Popping a0(r111,l0) -- assign reg 0 > > Popping a4(r117,l0) -- assign reg 0 > > Popping a1(r113,l0) -- assign reg 2 > > Assigning 4 to a5r116 > > Disposition: > > 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3 > > 5:r116 l0 44:r117 l0 0 > > > > > > r116 does not conflict with *any* other pseudo. It is alive in the first > > two insns of the function, which are > > So we initially assign r3 to r116 presumably because it has the same cost as > the other gprs and it occurs first in REG_ALLOC_ORDER. Then > improve_allocation() decides that r4 is a better hard reg and switches the > assignment to that. I'm not sure why it wouldn't choose r0 there instead. I would expect that r116 has a strong preference for r0 given the r116 = mov r0 and thus allocating r116 to r0 should have the lowest cost by a large margin.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #18 from Peter Bergner --- (In reply to Segher Boessenkool from comment #15) > Popping a5(r116,l0) -- assign reg 3 > Popping a3(r112,l0) -- assign reg 4 > Popping a2(r114,l0) -- assign reg 3 > Popping a0(r111,l0) -- assign reg 0 > Popping a4(r117,l0) -- assign reg 0 > Popping a1(r113,l0) -- assign reg 2 > Assigning 4 to a5r116 > Disposition: > 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3 > 5:r116 l0 44:r117 l0 0 > > > r116 does not conflict with *any* other pseudo. It is alive in the first > two insns of the function, which are So we initially assign r3 to r116 presumably because it has the same cost as the other gprs and it occurs first in REG_ALLOC_ORDER. Then improve_allocation() decides that r4 is a better hard reg and switches the assignment to that. I'm not sure why it wouldn't choose r0 there instead.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||vmakarov at gcc dot gnu.org --- Comment #17 from Jakub Jelinek --- (In reply to Segher Boessenkool from comment #15) > Forming thread by copy 0:a0r111-a4r117 (freq=500): > Result (freq=3500): a0r111(2500) a4r117(1000) > Forming thread by copy 2:a3r112-a5r116 (freq=125): > Result (freq=4500): a3r112(1500) a5r116(3000) > Forming thread by copy 1:a2r114-a3r112 (freq=62): > Result (freq=5500): a2r114(1000) a3r112(1500) a5r116(3000) > Pushing a1(r113,l0)(cost 0) > Pushing a4(r117,l0)(cost 0) > Pushing a0(r111,l0)(cost 0) > Pushing a2(r114,l0)(cost 0) > Pushing a3(r112,l0)(cost 0) > Pushing a5(r116,l0)(cost 0) > Popping a5(r116,l0) -- assign reg 3 > Popping a3(r112,l0) -- assign reg 4 > Popping a2(r114,l0) -- assign reg 3 > Popping a0(r111,l0) -- assign reg 0 > Popping a4(r117,l0) -- assign reg 0 > Popping a1(r113,l0) -- assign reg 2 > Assigning 4 to a5r116 > Disposition: > 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3 > 5:r116 l0 44:r117 l0 0 > > > r116 does not conflict with *any* other pseudo. It is alive in the first > two insns of the function, which are > > (insn 50 3 7 2 (set (reg:SI 116) > (reg:SI 0 r0 [ a ])) "ira-shrinkwrap-prep-1.c":14:1 181 > {*arm_movsi_insn} > (nil)) > (insn 7 50 8 2 (parallel [ > (set (reg:CC 100 cc) > (compare:CC (reg:SI 116) > (const_int 0 [0]))) > (set (reg/v:SI 112 [ a ]) > (reg:SI 116)) > ]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0} > (expr_list:REG_DEAD (reg:SI 116) > (nil))) > > r0 _is_ used by a successor (as the argument for the call to foo), but we > could use r0 for r116 anyway, since what we assign to it is r0 :-) CCing Vlad on this. I don't see that *movsi_compare0 would in any way prefer the =r,0 alternative over =r,r and using the =r,r alternative would allow to remove one instruction.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #16 from Segher Boessenkool --- (Which would make insn 50 go away, if you prefer to look at it that way).
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #15 from Segher Boessenkool --- Forming thread by copy 0:a0r111-a4r117 (freq=500): Result (freq=3500): a0r111(2500) a4r117(1000) Forming thread by copy 2:a3r112-a5r116 (freq=125): Result (freq=4500): a3r112(1500) a5r116(3000) Forming thread by copy 1:a2r114-a3r112 (freq=62): Result (freq=5500): a2r114(1000) a3r112(1500) a5r116(3000) Pushing a1(r113,l0)(cost 0) Pushing a4(r117,l0)(cost 0) Pushing a0(r111,l0)(cost 0) Pushing a2(r114,l0)(cost 0) Pushing a3(r112,l0)(cost 0) Pushing a5(r116,l0)(cost 0) Popping a5(r116,l0) -- assign reg 3 Popping a3(r112,l0) -- assign reg 4 Popping a2(r114,l0) -- assign reg 3 Popping a0(r111,l0) -- assign reg 0 Popping a4(r117,l0) -- assign reg 0 Popping a1(r113,l0) -- assign reg 2 Assigning 4 to a5r116 Disposition: 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3 5:r116 l0 44:r117 l0 0 r116 does not conflict with *any* other pseudo. It is alive in the first two insns of the function, which are (insn 50 3 7 2 (set (reg:SI 116) (reg:SI 0 r0 [ a ])) "ira-shrinkwrap-prep-1.c":14:1 181 {*arm_movsi_insn} (nil)) (insn 7 50 8 2 (parallel [ (set (reg:CC 100 cc) (compare:CC (reg:SI 116) (const_int 0 [0]))) (set (reg/v:SI 112 [ a ]) (reg:SI 116)) ]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0} (expr_list:REG_DEAD (reg:SI 116) (nil))) r0 _is_ used by a successor (as the argument for the call to foo), but we could use r0 for r116 anyway, since what we assign to it is r0 :-)
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #14 from Peter Bergner --- (In reply to Segher Boessenkool from comment #12) > Disposition: > 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3 > 5:r116 l0 44:r117 l0 0 > > If r116 had been allocated hard reg 0 all would be fine (and we know r116 > dies in insn 7 already, there is a REG_DEAD note on it). What was the order of assignment? If r116 conflicts with r111 or r117 and they were assigned first, then that's just bad luck.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #13 from Richard Biener --- Can we xfail/defer the bug?
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #12 from Segher Boessenkool --- (In reply to Segher Boessenkool from comment #11) > (In reply to Wilco from comment #8) > > mov r4, r0 > > cmp r4, #0 > > Why does it copy r0 to r4 and then compare r4? On more modern machines it > is faster to compare r0 itself, and it would allow shrink-wrapping to work > fine here We get this in combine: Trying 2 -> 7: 2: r112:SI=r116:SI REG_DEAD r116:SI 7: cc:CC=cmp(r112:SI,0) Successfully matched this instruction: (parallel [ (set (reg:CC 100 cc) (compare:CC (reg:SI 116) (const_int 0 [0]))) (set (reg/v:SI 112 [ a ]) (reg:SI 116)) ]) (that's *movsi_compare0). This is preceded by (insn 50 3 7 2 (set (reg:SI 116) (reg:SI 0 r0 [ a ])) "ira-shrinkwrap-prep-1.c":14:1 179 {*arm_movsi_insn} (nil)) And it stays that way until IRA, which does Disposition: 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3 5:r116 l0 44:r117 l0 0 If r116 had been allocated hard reg 0 all would be fine (and we know r116 dies in insn 7 already, there is a REG_DEAD note on it).
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #11 from Segher Boessenkool --- (In reply to Wilco from comment #8) > push{r4, lr} > mov r4, r0 > cmp r4, #0 Why does it copy r0 to r4 and then compare r4? On more modern machines it is faster to compare r0 itself, and it would allow shrink-wrapping to work fine here (well, need to move the assignment to r4 down to the block where it is used, but something will certainly do that, and it is one of the shrink-wrapping improvements I want to do for GCC 10). > It seems shrinkwrapping is more random, sometimes it's done as expected, > sometimes it is not. It was more consistent on older GCC's. Shrink-wrapping is very predictable. But no block where a non-volatile register is used or set will get shrink-wrapped. This limitation has existed since forever.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Richard Earnshaw changed: What|Removed |Added CC||rearnsha at gcc dot gnu.org --- Comment #10 from Richard Earnshaw --- I wonder if this could be picked up in the post-reload CSE pass? (ie rewriting the CBZ to use the incoming hard reg?)
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #9 from Richard Earnshaw --- (In reply to Wilco from comment #8) > (In reply to Segher Boessenkool from comment #5) > > The first one just needs an xfail. I don't know if it should be *-*-* there > > or only arm*-*-* should be added. > > > > The other two need some debugging by someone who knows the target and/or > > these tests. > > The previous code for Arm was: > > cbz r0, .L5 > push{r4, lr} > mov r4, r0 > bl foo > movwr2, #:lower16:.LANCHOR0 > movtr2, #:upper16:.LANCHOR0 > add r4, r4, r0 > str r4, [r2] > pop {r4, pc} > .L5: > movsr0, #1 > bx lr > > Now it fails to shrinkwrap: > > push{r4, lr} > mov r4, r0 > cmp r4, #0 > moveq r0, #1 > beq .L3 > bl foo > ldr r2, .L7 > add r3, r4, r0 > str r3, [r2] > .L3: > pop {r4, lr} > bx lr > > It seems shrinkwrapping is more random, sometimes it's done as expected, > sometimes it is not. It was more consistent on older GCC's. This looks like another fallout of not allowing combine to merge with hard regs. Previously the CBZ could be moved outside of the prologue because it operated directly on the incoming hard reg. Now it only sees the value after the copy into the pseudo, which is a call-saved reg because it's live over the call.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #8 from Wilco --- (In reply to Segher Boessenkool from comment #5) > The first one just needs an xfail. I don't know if it should be *-*-* there > or only arm*-*-* should be added. > > The other two need some debugging by someone who knows the target and/or > these tests. The previous code for Arm was: cbz r0, .L5 push{r4, lr} mov r4, r0 bl foo movwr2, #:lower16:.LANCHOR0 movtr2, #:upper16:.LANCHOR0 add r4, r4, r0 str r4, [r2] pop {r4, pc} .L5: movsr0, #1 bx lr Now it fails to shrinkwrap: push{r4, lr} mov r4, r0 cmp r4, #0 moveq r0, #1 beq .L3 bl foo ldr r2, .L7 add r3, r4, r0 str r3, [r2] .L3: pop {r4, lr} bx lr It seems shrinkwrapping is more random, sometimes it's done as expected, sometimes it is not. It was more consistent on older GCC's.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Priority|P3 |P1
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Ramana Radhakrishnan changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-12-14 Ever confirmed|0 |1 --- Comment #7 from Ramana Radhakrishnan --- Confirmed.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Ramana Radhakrishnan changed: What|Removed |Added CC||ramana at gcc dot gnu.org --- Comment #6 from Ramana Radhakrishnan --- (In reply to Segher Boessenkool from comment #5) > The first one just needs an xfail. I don't know if it should be *-*-* there > or only arm*-*-* should be added. > > The other two need some debugging by someone who knows the target and/or > these tests. for the addr-modes-float.c case there are additional vmov's being generated and thus is certainly a regression. --- 8.s 2018-12-14 09:41:04.367843079 + +++ addr-modes-float.s 2018-12-14 09:40:39.907980812 + @@ -139,10 +139,13 @@ @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. + vmovq8, q0 @ ti mov r3, r0 + vmovq9, q1 @ ti add r0, r0, #48 - vst3.8 {d0, d2, d4}, [r3]! - vst3.8 {d1, d3, d5}, [r3] + vmovq10, q2 @ ti + vst3.8 {d16, d18, d20}, [r3]! + vst3.8 {d17, d19, d21}, [r3]
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #5 from Segher Boessenkool --- The first one just needs an xfail. I don't know if it should be *-*-* there or only arm*-*-* should be added. The other two need some debugging by someone who knows the target and/or these tests.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #4 from Christophe Lyon --- As of r266293, the following regressions reported here are still failing: FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping" FAIL: gcc.target/arm/addr-modes-float.c scan-assembler vst3.8\t{d[02468], d[02468], d[02468]}, \\[r[0-9]+\\]! FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times strh\\tr[0-9]+ 2 FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times vst1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #3 from Segher Boessenkool --- I don't know, this is up to the arm people. I don't know if all problems reported here are fixed now.
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Martin Liška changed: What|Removed |Added CC||marxin at gcc dot gnu.org --- Comment #2 from Martin Liška --- Segher: Can the bug be marked as resolved?
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 --- Comment #1 from Segher Boessenkool --- Author: segher Date: Mon Nov 5 21:18:22 2018 New Revision: 265821 URL: https://gcc.gnu.org/viewcvs?rev=265821=gcc=rev Log: combine: Don't make an intermediate reg for assigning to sfp (PR87871) The code with an intermediate register is perfectly fine, but LRA apparently cannot handle the resulting code, or perhaps something else is wrong. In either case, making an extra temporary will not likely help here, so let's just skip it. PR rtl-optimization/87871 * combine.c (make_more_copies): Skip if dest is frame_pointer_rtx. Modified: trunk/gcc/ChangeLog trunk/gcc/combine.c
[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871 Richard Biener changed: What|Removed |Added Target Milestone|--- |9.0