[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-25 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P1  |P2

--- Comment #60 from Jakub Jelinek  ---
This PR had various fixes applied already and the remaining issues don't
warrant a release blocker, so downgrading this to P2.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-23 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #59 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #58)
> If we don't want to go with #c35 at least for GCC 9, would the #c44 patch be
> still useful without it (does it ever trigger say on the kernel where it
> didn't trigger before)?

The patch in comment 44 is obviously good, it improves the size by 0.090%
as noted (this is a kernel build, multi_v5_defconfig iirc).

I'd say it is perfectly safe for GCC 9, but I'm not an Arm maintainer :-)

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-23 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #58 from Jakub Jelinek  ---
If we don't want to go with #c35 at least for GCC 9, would the #c44 patch be
still useful without it (does it ever trigger say on the kernel where it didn't
trigger before)?

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-23 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at redhat dot com

--- Comment #57 from Jeffrey A. Law  ---
So what's actually left to do with this BZ?  ie, what tests are still
regressing?

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Peter Bergner  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|bergner at gcc dot gnu.org |unassigned at gcc dot 
gnu.org

--- Comment #56 from Peter Bergner  ---
I committed the RA fix.  Unassigning myself now.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #55 from Peter Bergner  ---
Author: bergner
Date: Thu Apr 18 22:14:17 2019
New Revision: 270448

URL: https://gcc.gnu.org/viewcvs?rev=270448=gcc=rev
Log:
PR rtl-optimization/87871
* ira-lives.c (make_object_dead): Don't add conflicts to
TOTAL_CONFLICT_HARD_REGS for register ignore_reg_for_conflicts.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/ira-lives.c

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #54 from Segher Boessenkool  ---
(In reply to Wilco from comment #52)
> (In reply to Segher Boessenkool from comment #48)
> > With just Peter's and Jakub's patch, it *improves* code size by 0.090%.
> > That does not fix this PR though :-/
> 
> But it does fix most of the codesize regression.

Yes, and it often creates *better* code, as far as I can see.

> The shrinkwrapping testcase
> seems a preexisting problem that was exposed by the combine changes,

It is.

> so it
> doesn't need to hold up the release. The regalloc change might fix
> addr-modes-float.c too.

I'd like to see the RA fix in GCC 9.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #53 from Segher Boessenkool  ---
(In reply to Richard Earnshaw from comment #51)
> In the more general case splitting this would produce worse code, not
> better, since then we'd end up with two instructions rather than one.

Sure, it _often_ is good to have it merged.  Quite clearly more often than
not it's good, so if you have to pick only one way, this is the way to go.

Hopefully we can do better though.  But not for stage 4, sure.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #52 from Wilco  ---
(In reply to Segher Boessenkool from comment #48)
> With just Peter's and Jakub's patch, it *improves* code size by 0.090%.
> That does not fix this PR though :-/

But it does fix most of the codesize regression. The shrinkwrapping testcase
seems a preexisting problem that was exposed by the combine changes, so it
doesn't need to hold up the release. The regalloc change might fix
addr-modes-float.c too.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #51 from Richard Earnshaw  ---
(In reply to Segher Boessenkool from comment #50)
> The insn is
> 
> (insn 7 3 8 2 (parallel [
> (set (reg:CC 100 cc)
> (compare:CC (reg:SI 0 r0 [116])
> (const_int 0 [0])))
> (set (reg/v:SI 4 r4 [orig:112 a ] [112])
> (reg:SI 0 r0 [116]))
> ]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0}
>  (nil))
> 
> and that isn't split, and then prepare_shrink_wrap gives up on it.

In the more general case splitting this would produce worse code, not better,
since then we'd end up with two instructions rather than one.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #50 from Segher Boessenkool  ---
The insn is

(insn 7 3 8 2 (parallel [
(set (reg:CC 100 cc)
(compare:CC (reg:SI 0 r0 [116])
(const_int 0 [0])))
(set (reg/v:SI 4 r4 [orig:112 a ] [112])
(reg:SI 0 r0 [116]))
]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0}
 (nil))

and that isn't split, and then prepare_shrink_wrap gives up on it.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #49 from Segher Boessenkool  ---
(In reply to Wilco from comment #47)
> (In reply to Segher Boessenkool from comment #46)
> > With all three patches together (Peter's, mine, Jakub's), I get a code size
> > increase of only 0.047%, much more acceptable.  Now looking what that diff
> > really *is* :-)
> 
> I think with Jakub's change you don't need to disable the movsi_compare0
> pattern in combine. If regalloc works as expected, it will get split into a
> compare so shrinkwrap can handle it.

prepare_shrink_wrap can not handle that.  prepare_shrink_wrap needs to be
improved for other reasons, of course.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #48 from Segher Boessenkool  ---
With just Peter's and Jakub's patch, it *improves* code size by 0.090%.
That does not fix this PR though :-/

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #47 from Wilco  ---
(In reply to Segher Boessenkool from comment #46)
> With all three patches together (Peter's, mine, Jakub's), I get a code size
> increase of only 0.047%, much more acceptable.  Now looking what that diff
> really *is* :-)

I think with Jakub's change you don't need to disable the movsi_compare0
pattern in combine. If regalloc works as expected, it will get split into a
compare so shrinkwrap can handle it.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #46 from Segher Boessenkool  ---
With all three patches together (Peter's, mine, Jakub's), I get a code size
increase of only 0.047%, much more acceptable.  Now looking what that diff
really *is* :-)

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Peter Bergner  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/ml/gcc-
   ||patches/2019-04/msg00768.ht
   ||ml

--- Comment #45 from Peter Bergner  ---
I submitted a patch to fix the IRA conflict issue.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #44 from Jakub Jelinek  ---
Well, it requires that the RA looks specially for this kind of pattern and if
it ends up being a noop move, nothing simplifies the pattern again back to
normal comparison, and as Segher noted, it can negatively affect other
optimization passes.

Completely untested patch peephole2 patch:
--- gcc/config/arm/arm.md.jj2019-03-19 11:04:49.283170205 +0100
+++ gcc/config/arm/arm.md   2019-04-18 16:21:18.974543408 +0200
@@ -10928,12 +10928,22 @@
   [(set (match_operand:SI 0 "arm_general_register_operand" "")
(match_operand:SI 1 "arm_general_register_operand" ""))
(set (reg:CC CC_REGNUM)
-   (compare:CC (match_dup 1) (const_int 0)))]
+   (compare:CC (match_operand:SI 2 "arm_general_register_operand" "")
+   (const_int 0)))]
+  "TARGET_ARM
+   && (rtx_equal_p (operands[2], operands[0])
+   || rtx_equal_p (operands[2], operands[1]))"
+  [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 1) (const_int
0)))
+ (set (match_dup 0) (match_dup 1))])])
+
+(define_peephole2
+  [(set (reg:CC CC_REGNUM)
+   (compare:CC (match_operand:SI 1 "arm_general_register_operand" "")
+   (const_int 0)))]
+   (set (match_operand:SI 0 "arm_general_register_operand" "") (match_dup 1))]
   "TARGET_ARM"
   [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 1) (const_int
0)))
- (set (match_dup 0) (match_dup 1))])]
-  ""
-)
+ (set (match_dup 0) (match_dup 1))])])

 (define_split
   [(set (match_operand:SI 0 "s_register_operand" "")

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #43 from Peter Bergner  ---
(In reply to Jakub Jelinek from comment #40)
> The question is what the code size differences would be with those changes
> (i.e. how often does it help not to have *movsi_compare0 make RA decisions
> worse vs. how often we actually have those two instructions separated by
> other insns).

How does *movsi_compare0 make RA decisions worse other than the issue of p116
not being assigned r0 above, which my patch attached above fixes?

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #42 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #40)
> The question is what the code size differences would be with those changes
> (i.e. how often does it help not to have *movsi_compare0 make RA decisions
> worse vs. how often we actually have those two instructions separated by
> other insns).

Yeah.  If someone writes patches adding the peepholes, I can test it, but I'm
no hero at writing peepholes, esp. for an arch I do not fully understand :-/

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #41 from Segher Boessenkool  ---
(In reply to Wilco from comment #38)
> Well the question really is what is bad about movsi_compare0 that could be
> easily fixed?

"Easily fixed"...  There is no such thing here.

Because it is a parallel everything has to work on the compare and the move
together.  Various things do not handle that, things that only handle simple
moves for example.  Like prepare_shrink_wrap in this testcase.  And for many
other things you have to split the parallel before you can do the transform
you want.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #40 from Jakub Jelinek  ---
(In reply to Segher Boessenkool from comment #39)
> On a linux kernel defconfig build it increases code size by 0.567%.
> That seems a bit much :-(
> 
> The peephole only recognises
> 
>   mov rA,rB
>   cmp rB,#0
> 
> and not
> 
>   mov rA,rB
>   cmp rA,#0

Well, changing the peephole2 so that it handles both of the above at the same
time shall be quite easy.
> 
> or
> 
>   cmp rB,#0
>   mov rA,rB

And adding a peephole for this case too.

The question is what the code size differences would be with those changes
(i.e. how often does it help not to have *movsi_compare0 make RA decisions
worse vs. how often we actually have those two instructions separated by other
insns).

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #39 from Segher Boessenkool  ---
On a linux kernel defconfig build it increases code size by 0.567%.
That seems a bit much :-(

The peephole only recognises

  mov rA,rB
  cmp rB,#0

and not

  mov rA,rB
  cmp rA,#0

or

  cmp rB,#0
  mov rA,rB

and we see a lot of the latter, after my patch anyway.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #38 from Wilco  ---
(In reply to Segher Boessenkool from comment #37)
> Yes, it is a balancing act.  Which option works better?

Well the question really is what is bad about movsi_compare0 that could be
easily fixed?

The move is for free so there is no need for the "r,0" variant in principle, so
if that helps reducing constraints on register allocation then we could remove
or reorder that alternative.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #37 from Segher Boessenkool  ---
Yes, it is a balancing act.  Which option works better?

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #36 from Richard Earnshaw  ---
(In reply to Segher Boessenkool from comment #35)
> Peter's patch solves this particular problem, but not the PR unfortunately.
> 
> I finally understand Jakub's comment 30.  This patch solves the PR (also
> without Peter's patch):
> 
> ===
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 0aecd03..67dddb2 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -6340,7 +6340,7 @@ (define_insn "*movsi_compare0"
> (const_int 0)))
> (set (match_operand:SI 0 "s_register_operand" "=r,r")
> (match_dup 1))]
> -  "TARGET_32BIT"
> +  "TARGET_32BIT && reload_completed"
>"@
> cmp%?\\t%0, #0
> subs%?\\t%0, %1, #0"
> ===

And what about all the cases where the move and compare are not adjacent in the
instruction stream so don't get matched by peepholing?

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #35 from Segher Boessenkool  ---
Peter's patch solves this particular problem, but not the PR unfortunately.

I finally understand Jakub's comment 30.  This patch solves the PR (also
without Peter's patch):

===
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0aecd03..67dddb2 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6340,7 +6340,7 @@ (define_insn "*movsi_compare0"
(const_int 0)))
(set (match_operand:SI 0 "s_register_operand" "=r,r")
(match_dup 1))]
-  "TARGET_32BIT"
+  "TARGET_32BIT && reload_completed"
   "@
cmp%?\\t%0, #0
subs%?\\t%0, %1, #0"
===

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-17 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Peter Bergner  changed:

   What|Removed |Added

  Attachment #46189|0   |1
is obsolete||

--- Comment #34 from Peter Bergner  ---
Created attachment 46190
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46190=edit
Updated patch

Updated patch that is functionally the same, but I like this one better.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-17 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Peter Bergner  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |bergner at gcc dot 
gnu.org

--- Comment #33 from Peter Bergner  ---
Created attachment 46189
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46189=edit
Proposed patch

Here is a patch that fixes make_object_dead() that was causing r0 to be
incorrectly added to p116's total_conflict_regs which made it impossible to
assign r0 to p116.  With this patch, we now assign r0 to p116 like we want:

;; a5(r116,l0) conflicts:
;; total conflict hard regs:
;; conflict hard regs:

...

  Popping a5(r116,l0)  -- assign reg 0
  Popping a3(r112,l0)  -- assign reg 4
  Popping a2(r114,l0)  -- assign reg 4
  Popping a0(r111,l0)  -- assign reg 0
  Popping a4(r117,l0)  -- assign reg 0
  Popping a1(r113,l0)  -- assign reg 3
Disposition:
0:r111 l0 03:r112 l0 41:r113 l0 32:r114 l0 4
5:r116 l0 04:r117 l0 0


Can someone on the ARM side please bootstrap and regtest the patch to see if it
fixes the testsuite fallout?  I'll bootstrap and regtest it on power.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-17 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #32 from Peter Bergner  ---
(In reply to Peter Bergner from comment #26)
> (In reply to Vladimir Makarov from comment #25)
> > (In reply to Peter Bergner from comment #24)
> >> I don't know why r0 isn't in profitable_regs for pseudo 116.
> >  
> > Profitable regs there contain also conflict regs.  R0 is conflicting with
> > p106. If R0 usage (in call insn) were in the same BB, your new conflict
> > calculation found that there is no actual conflict.  But IRA uses
> > df-infrastructure which tells IRA that R0 lives at the BB end where p106
> > occurs.
> 
> I'm sorry, but I don't see where p116 conflicts with r0.  Can you show me
> where/how?  Looking at my IRA dump, I see:

Ok, so there is a bug in print_allocno_conflicts() that causes us to skip
printing the hard reg conflicts if the allocno doesn't have any conflicts with
other allocnos.  I submitted a patch to fix that.  With the fix, I know see the
following conflict info for p116:

;; a5(r116,l0) conflicts:
;; total conflict hard regs: 0
;; conflict hard regs:

So this explains why p116 isn't assigned r0.  That doesn't explain why p116
conflicts with r0 though, because looking at the rtl brlow, it shouldn't:


(insn 50 3 7 2 (set (reg:SI 116)
(reg:SI 0 r0 [ aD.4197 ])) "bug.i":7:1 181 {*arm_movsi_insn}
 (nil))
(insn 7 50 8 2 (parallel [
(set (reg:CC 100 cc)
(compare:CC (reg:SI 116)
(const_int 0 [0])))
(set (reg/v:SI 112 [ aD.4197 ])
(reg:SI 116))
]) "bug.i":10:6 188 {*movsi_compare0}
 (expr_list:REG_DEAD (reg:SI 116)
(nil)))


So yes, r0 is live at the definition of p116, we know they have the same value.
 My ira-conflicts.c changes adding non_conflicting_reg_copy_p() should have
handled that, but it isn't.  Now non_conflicting_reg_copy_p() does correctly
notice that insn 50 is a simple copy that we can ignore for conflict purposes,
but somehow, a conflict is still being added.

I tracked the problem down to ira-conflicts.c:make_object_dead() not handling
ignore_reg_for_conflicts correctly.  The bug is that we correctly remove the
ignored reg (r0) from OBJECT_CONFLICT_HARD_REGS, but we miss removing it from
OBJECT_TOTAL_CONFLICT_HARD_REGS too.  I'm working on a patch.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-17 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #31 from Segher Boessenkool  ---
It's how you do a parallel of a mov and a flags set, which of course you
can have before RA, and you want created by combine, typically.  Or do I
misunderstand the question?

(I though Arm have a "movs" op for this, btw?)

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-17 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #30 from Jakub Jelinek  ---
Is the *movsi_compare0 pattern actually ever a benefit before RA?  At least in
this case it clearly results in a worse generated code rather than better, and
I bet in other cases too, it just ties the hands of the RA too much.
I wonder if it better shouldn't be a pattern that is only matched when
reload_completed and recognized say by a peephole2 or something similar.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-16 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #29 from Peter Bergner  ---
(In reply to Segher Boessenkool from comment #27)
> > Note: I'm assuming we're missing a \n after p116's empty conflicts above?
> 
> The code is

Right.  I already whipped up a patch that gives me:

;; a5(r116,l0) conflicts:
;; total conflict hard regs:
;; conflict hard regs:


  cp0:a0(r111)<->a4(r117)@330:move
  ...

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-16 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #28 from Peter Bergner  ---
Vlad, in looking at add_insn_allocno_copies(), it looks like it relies on
seeing REG_DEAD notes on whether to record a copy/shuffle that should be
handled.  Shouldn't we instead be looking at whether the source and destination
regs conflict or not?  Ie, there might not be a REG_DEAD note, but that doesn't
mean the two regs/pseudos conflict.  And conversely, if there is a REG_DEAD
note on the copy/shuffle, the two regs/pseudos still could conflict.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-16 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #27 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #26)
> ;; a4(r117,l0) conflicts: a3(r112,l0)
> ;; total conflict hard regs:
> ;; conflict hard regs:
> 
> ;; a5(r116,l0) conflicts:  cp0:a0(r111)<->a4(r117)@330:move
>   cp1:a2(r114)<->a3(r112)@41:shuffle
>   cp2:a3(r112)<->a5(r116)@125:shuffle
>   pref0:a0(r111)<-hr0@2000
>   pref1:a4(r117)<-hr0@660
>   pref2:a5(r116)<-hr0@1000
>   regions=1, blocks=6, points=10
> allocnos=6 (big 0), copies=3, conflicts=0, ranges=6
> 
> Note: I'm assuming we're missing a \n after p116's empty conflicts above?

The code is

  fputs (" conflicts:", file);
  n = ALLOCNO_NUM_OBJECTS (a);
  for (i = 0; i < n; i++)
{
  ira_object_t obj = ALLOCNO_OBJECT (a, i);
  ira_object_t conflict_obj;
  ira_object_conflict_iterator oci;

  if (OBJECT_CONFLICT_ARRAY (obj) == NULL)
continue;
  [...]
}

and the

;; total conflict hard regs:

etc. prints are in that [...].

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-16 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #26 from Peter Bergner  ---
(In reply to Vladimir Makarov from comment #25)
> (In reply to Peter Bergner from comment #24)
>> I don't know why r0 isn't in profitable_regs for pseudo 116.
>  
> Profitable regs there contain also conflict regs.  R0 is conflicting with
> p106. If R0 usage (in call insn) were in the same BB, your new conflict
> calculation found that there is no actual conflict.  But IRA uses
> df-infrastructure which tells IRA that R0 lives at the BB end where p106
> occurs.

I'm sorry, but I don't see where p116 conflicts with r0.  Can you show me
where/how?  Looking at my IRA dump, I see:


+++Allocating 40 bytes for conflict table (uncompressed size 48)
;; a0(r111,l0) conflicts: a2(r114,l0) a1(r113,l0) a3(r112,l0)
;; total conflict hard regs:
;; conflict hard regs:

;; a1(r113,l0) conflicts: a0(r111,l0) a2(r114,l0) a3(r112,l0)
;; total conflict hard regs:
;; conflict hard regs:

;; a2(r114,l0) conflicts: a0(r111,l0) a1(r113,l0)
;; total conflict hard regs:
;; conflict hard regs:

;; a3(r112,l0) conflicts: a0(r111,l0) a1(r113,l0) a4(r117,l0)
;; total conflict hard regs: 0 12 14
;; conflict hard regs: 0 12 14

;; a4(r117,l0) conflicts: a3(r112,l0)
;; total conflict hard regs:
;; conflict hard regs:

;; a5(r116,l0) conflicts:  cp0:a0(r111)<->a4(r117)@330:move
  cp1:a2(r114)<->a3(r112)@41:shuffle
  cp2:a3(r112)<->a5(r116)@125:shuffle
  pref0:a0(r111)<-hr0@2000
  pref1:a4(r117)<-hr0@660
  pref2:a5(r116)<-hr0@1000
  regions=1, blocks=6, points=10
allocnos=6 (big 0), copies=3, conflicts=0, ranges=6

Note: I'm assuming we're missing a \n after p116's empty conflicts above?

So I don't see p116 conflict with r0, but I do see we register a shuffle
between p112 and p116 and p112 does (correctly) conflict with r0.  Is it really
the shuffle between p112 and p116 that is preventing us from putting r0 into
p116's profitable regs in the hope the p112 and p116 may get assigned the same
reg allowing the removal of the copy?  If so, that shuffle, since it's attached
to the setting of the CC reg cannot actually be removed even if p112 and p116
are assigned the same register.  Should we just ignore those types of
shuffles/copies that have other side effects?

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-14 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #25 from Vladimir Makarov  ---
(In reply to Peter Bergner from comment #24)
> So improve_allocation() initially looks at using r0, but disregards it
> because check_hard_reg_p() returns false for r0, and that is because we fail
> this test:
> 
>   /* Checking only profitable hard regs.  */
>   if (! TEST_HARD_REG_BIT (profitable_regs, hard_regno))
> return false;
> 
> I don't know why r0 isn't in profitable_regs for pseudo 116.

Profitable regs there contain also conflict regs.  R0 is conflicting with p106.
If R0 usage (in call insn) were in the same BB, your new conflict calculation
found that there is no actual conflict.  But IRA uses df-infrastructure which
tells IRA that R0 lives at the BB end where p106 occurs.

So the right solution of the PR would be fixing df-infrastructure live analysis
or may be somehow to ignore usage of r0 in call insn. That is how see the
situation.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-12 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #24 from Peter Bergner  ---
So improve_allocation() initially looks at using r0, but disregards it because
check_hard_reg_p() returns false for r0, and that is because we fail this test:

  /* Checking only profitable hard regs.  */
  if (! TEST_HARD_REG_BIT (profitable_regs, hard_regno))
return false;

I don't know why r0 isn't in profitable_regs for pseudo 116.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-12 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #23 from Segher Boessenkool  ---
It says (I added some debug)

   Insn 50(l0): point = 27
ignoring for conflicts:
(reg:SI 0 r0 [ a ])

but non_conflicting_reg_copy_p isn't called at all where it is improving
the allocation

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-12 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #22 from Peter Bergner  ---
(In reply to Wilco from comment #21)
> (In reply to Vladimir Makarov from comment #20)
>> The question is why p116 conflicts with hr0.  Before RA we have
> 
> That's a bug since register copies should not create a conflict. It's one of
> the most basic optimization of register allocator.
> 
> And there is also the question why we do move r0 into a virtual register but
> not assign the virtual register to an argument register.

We don't since my patch adding that support in current trunk.  That said, if
non_conflicting_reg_copy_p() returns NULL_RTX for that r116=r0 copy insn, then
they will conflict.  So what does non_conflicting_reg_copy_p() return?  ...and
if it says they conflict, why?  The insn has side effects or SImode is a
register pair on arm or ???

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-12 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #21 from Wilco  ---
(In reply to Vladimir Makarov from comment #20)
> (In reply to Wilco from comment #19)
> > (In reply to Peter Bergner from comment #18)
> > > (In reply to Segher Boessenkool from comment #15)
> > > >   Popping a5(r116,l0)  -- assign reg 3
> > > >   Popping a3(r112,l0)  -- assign reg 4
> > > >   Popping a2(r114,l0)  -- assign reg 3
> > > >   Popping a0(r111,l0)  -- assign reg 0
> > > >   Popping a4(r117,l0)  -- assign reg 0
> > > >   Popping a1(r113,l0)  -- assign reg 2
> > > > Assigning 4 to a5r116
> > > > Disposition:
> > > > 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0  
> > > >3
> > > > 5:r116 l0 44:r117 l0 0
> > > > 
> > > > 
> > > > r116 does not conflict with *any* other pseudo.  It is alive in the 
> > > > first
> > > > two insns of the function, which are
> > > 
> > > So we initially assign r3 to r116 presumably because it has the same cost 
> > > as
> > > the other gprs and it occurs first in REG_ALLOC_ORDER.  Then
> > > improve_allocation() decides that r4 is a better hard reg and switches the
> > > assignment to that.  I'm not sure why it wouldn't choose r0 there instead.
> > 
> > I would expect that r116 has a strong preference for r0 given the r116 = mov
> > r0 and thus allocating r116 to r0 should have the lowest cost by a large
> > margin.
> 
> p116 conflicts with hr0.  Therefore it can not get hr0.  p112 is connected
> with p116.  p112 got hr4 and p116 got 3.  Assigning 4 to 116 is profitable. 
> Therefore assignment of p116 is changed to 4.
> 
> The question is why p116 conflicts with hr0.  Before RA we have

That's a bug since register copies should not create a conflict. It's one of
the most basic optimization of register allocator.

And there is also the question why we do move r0 into a virtual register but
not assign the virtual register to an argument register.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-12 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #20 from Vladimir Makarov  ---
(In reply to Wilco from comment #19)
> (In reply to Peter Bergner from comment #18)
> > (In reply to Segher Boessenkool from comment #15)
> > >   Popping a5(r116,l0)  -- assign reg 3
> > >   Popping a3(r112,l0)  -- assign reg 4
> > >   Popping a2(r114,l0)  -- assign reg 3
> > >   Popping a0(r111,l0)  -- assign reg 0
> > >   Popping a4(r117,l0)  -- assign reg 0
> > >   Popping a1(r113,l0)  -- assign reg 2
> > > Assigning 4 to a5r116
> > > Disposition:
> > > 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0
> > >  3
> > > 5:r116 l0 44:r117 l0 0
> > > 
> > > 
> > > r116 does not conflict with *any* other pseudo.  It is alive in the first
> > > two insns of the function, which are
> > 
> > So we initially assign r3 to r116 presumably because it has the same cost as
> > the other gprs and it occurs first in REG_ALLOC_ORDER.  Then
> > improve_allocation() decides that r4 is a better hard reg and switches the
> > assignment to that.  I'm not sure why it wouldn't choose r0 there instead.
> 
> I would expect that r116 has a strong preference for r0 given the r116 = mov
> r0 and thus allocating r116 to r0 should have the lowest cost by a large
> margin.

p116 conflicts with hr0.  Therefore it can not get hr0.  p112 is connected with
p116.  p112 got hr4 and p116 got 3.  Assigning 4 to 116 is profitable. 
Therefore assignment of p116 is changed to 4.

The question is why p116 conflicts with hr0.  Before RA we have

(insn 50 3 7 2 (set (reg:SI 116)
(reg:SI 0 r0 [ a ]))
"/home/cygnus/vmakarov/build1/trunk/gcc/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c":11:1
181 {*arm_mo\
vsi_insn}
 (nil))

---> No reg-dead r0!

because later we have

call_insn 11 9 51 3 (parallel [
(set (reg:SI 0 r0)
(call (mem:SI (symbol_ref:SI ("foo") [flags 0x41] 
) [0 foo S4 A32])
(const_int 0 [0])))
(use (const_int 0 [0]))
(clobber (reg:SI 14 lr))
])
"/home/cygnus/vmakarov/build1/trunk/gcc/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c":16:11
219 {*call_value_symbol}
 (expr_list:REG_CALL_DECL (symbol_ref:SI ("foo") [flags 0x41] 
)
(nil))
(expr_list (clobber (reg:SI 12 ip))
(expr_list:SI (use (reg:SI 0 r0))
(nil

---> use r0!

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-12 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #19 from Wilco  ---
(In reply to Peter Bergner from comment #18)
> (In reply to Segher Boessenkool from comment #15)
> >   Popping a5(r116,l0)  -- assign reg 3
> >   Popping a3(r112,l0)  -- assign reg 4
> >   Popping a2(r114,l0)  -- assign reg 3
> >   Popping a0(r111,l0)  -- assign reg 0
> >   Popping a4(r117,l0)  -- assign reg 0
> >   Popping a1(r113,l0)  -- assign reg 2
> > Assigning 4 to a5r116
> > Disposition:
> > 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3
> > 5:r116 l0 44:r117 l0 0
> > 
> > 
> > r116 does not conflict with *any* other pseudo.  It is alive in the first
> > two insns of the function, which are
> 
> So we initially assign r3 to r116 presumably because it has the same cost as
> the other gprs and it occurs first in REG_ALLOC_ORDER.  Then
> improve_allocation() decides that r4 is a better hard reg and switches the
> assignment to that.  I'm not sure why it wouldn't choose r0 there instead.

I would expect that r116 has a strong preference for r0 given the r116 = mov r0
and thus allocating r116 to r0 should have the lowest cost by a large margin.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-12 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #18 from Peter Bergner  ---
(In reply to Segher Boessenkool from comment #15)
>   Popping a5(r116,l0)  -- assign reg 3
>   Popping a3(r112,l0)  -- assign reg 4
>   Popping a2(r114,l0)  -- assign reg 3
>   Popping a0(r111,l0)  -- assign reg 0
>   Popping a4(r117,l0)  -- assign reg 0
>   Popping a1(r113,l0)  -- assign reg 2
> Assigning 4 to a5r116
> Disposition:
> 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3
> 5:r116 l0 44:r117 l0 0
> 
> 
> r116 does not conflict with *any* other pseudo.  It is alive in the first
> two insns of the function, which are

So we initially assign r3 to r116 presumably because it has the same cost as
the other gprs and it occurs first in REG_ALLOC_ORDER.  Then
improve_allocation() decides that r4 is a better hard reg and switches the
assignment to that.  I'm not sure why it wouldn't choose r0 there instead.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||vmakarov at gcc dot gnu.org

--- Comment #17 from Jakub Jelinek  ---
(In reply to Segher Boessenkool from comment #15)
>   Forming thread by copy 0:a0r111-a4r117 (freq=500):
> Result (freq=3500): a0r111(2500) a4r117(1000)
>   Forming thread by copy 2:a3r112-a5r116 (freq=125):
> Result (freq=4500): a3r112(1500) a5r116(3000)
>   Forming thread by copy 1:a2r114-a3r112 (freq=62):
> Result (freq=5500): a2r114(1000) a3r112(1500) a5r116(3000)
>   Pushing a1(r113,l0)(cost 0)
>   Pushing a4(r117,l0)(cost 0)
>   Pushing a0(r111,l0)(cost 0)
>   Pushing a2(r114,l0)(cost 0)
>   Pushing a3(r112,l0)(cost 0)
>   Pushing a5(r116,l0)(cost 0)
>   Popping a5(r116,l0)  -- assign reg 3
>   Popping a3(r112,l0)  -- assign reg 4
>   Popping a2(r114,l0)  -- assign reg 3
>   Popping a0(r111,l0)  -- assign reg 0
>   Popping a4(r117,l0)  -- assign reg 0
>   Popping a1(r113,l0)  -- assign reg 2
> Assigning 4 to a5r116
> Disposition:
> 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3
> 5:r116 l0 44:r117 l0 0
> 
> 
> r116 does not conflict with *any* other pseudo.  It is alive in the first
> two insns of the function, which are
> 
> (insn 50 3 7 2 (set (reg:SI 116)
> (reg:SI 0 r0 [ a ])) "ira-shrinkwrap-prep-1.c":14:1 181
> {*arm_movsi_insn}
>  (nil))
> (insn 7 50 8 2 (parallel [
> (set (reg:CC 100 cc)
> (compare:CC (reg:SI 116)
> (const_int 0 [0])))
> (set (reg/v:SI 112 [ a ])
> (reg:SI 116))
> ]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0}
>  (expr_list:REG_DEAD (reg:SI 116)
> (nil)))
> 
> r0 _is_ used by a successor (as the argument for the call to foo), but we
> could use r0 for r116 anyway, since what we assign to it is r0 :-)

CCing Vlad on this.  I don't see that *movsi_compare0 would in any way prefer
the =r,0 alternative over =r,r and using the =r,r alternative would allow to
remove one instruction.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-11 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #16 from Segher Boessenkool  ---
(Which would make insn 50 go away, if you prefer to look at it that way).

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-11 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #15 from Segher Boessenkool  ---
  Forming thread by copy 0:a0r111-a4r117 (freq=500):
Result (freq=3500): a0r111(2500) a4r117(1000)
  Forming thread by copy 2:a3r112-a5r116 (freq=125):
Result (freq=4500): a3r112(1500) a5r116(3000)
  Forming thread by copy 1:a2r114-a3r112 (freq=62):
Result (freq=5500): a2r114(1000) a3r112(1500) a5r116(3000)
  Pushing a1(r113,l0)(cost 0)
  Pushing a4(r117,l0)(cost 0)
  Pushing a0(r111,l0)(cost 0)
  Pushing a2(r114,l0)(cost 0)
  Pushing a3(r112,l0)(cost 0)
  Pushing a5(r116,l0)(cost 0)
  Popping a5(r116,l0)  -- assign reg 3
  Popping a3(r112,l0)  -- assign reg 4
  Popping a2(r114,l0)  -- assign reg 3
  Popping a0(r111,l0)  -- assign reg 0
  Popping a4(r117,l0)  -- assign reg 0
  Popping a1(r113,l0)  -- assign reg 2
Assigning 4 to a5r116
Disposition:
0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3
5:r116 l0 44:r117 l0 0


r116 does not conflict with *any* other pseudo.  It is alive in the first
two insns of the function, which are

(insn 50 3 7 2 (set (reg:SI 116)
(reg:SI 0 r0 [ a ])) "ira-shrinkwrap-prep-1.c":14:1 181
{*arm_movsi_insn}
 (nil))
(insn 7 50 8 2 (parallel [
(set (reg:CC 100 cc)
(compare:CC (reg:SI 116)
(const_int 0 [0])))
(set (reg/v:SI 112 [ a ])
(reg:SI 116))
]) "ira-shrinkwrap-prep-1.c":17:6 188 {*movsi_compare0}
 (expr_list:REG_DEAD (reg:SI 116)
(nil)))

r0 _is_ used by a successor (as the argument for the call to foo), but we
could use r0 for r116 anyway, since what we assign to it is r0 :-)

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-11 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #14 from Peter Bergner  ---
(In reply to Segher Boessenkool from comment #12)
> Disposition:
> 0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3
> 5:r116 l0 44:r117 l0 0
> 
> If r116 had been allocated hard reg 0 all would be fine (and we know r116
> dies in insn 7 already, there is a REG_DEAD note on it).

What was the order of assignment?  If r116 conflicts with r111 or r117 and they
were assigned first, then that's just bad luck.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #13 from Richard Biener  ---
Can we xfail/defer the bug?

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-05 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #12 from Segher Boessenkool  ---
(In reply to Segher Boessenkool from comment #11)
> (In reply to Wilco from comment #8)
> > mov r4, r0
> > cmp r4, #0
> 
> Why does it copy r0 to r4 and then compare r4?  On more modern machines it
> is faster to compare r0 itself, and it would allow shrink-wrapping to work
> fine here

We get this in combine:

Trying 2 -> 7:
2: r112:SI=r116:SI
  REG_DEAD r116:SI
7: cc:CC=cmp(r112:SI,0)
Successfully matched this instruction:
(parallel [
(set (reg:CC 100 cc)
(compare:CC (reg:SI 116)
(const_int 0 [0])))
(set (reg/v:SI 112 [ a ])
(reg:SI 116))
])

(that's *movsi_compare0).


This is preceded by

(insn 50 3 7 2 (set (reg:SI 116)
(reg:SI 0 r0 [ a ])) "ira-shrinkwrap-prep-1.c":14:1 179
{*arm_movsi_insn}
 (nil))


And it stays that way until IRA, which does

Disposition:
0:r111 l0 03:r112 l0 41:r113 l0 22:r114 l0 3
5:r116 l0 44:r117 l0 0

If r116 had been allocated hard reg 0 all would be fine (and we know r116
dies in insn 7 already, there is a REG_DEAD note on it).

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-05 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #11 from Segher Boessenkool  ---
(In reply to Wilco from comment #8)
>   push{r4, lr}
>   mov r4, r0
>   cmp r4, #0

Why does it copy r0 to r4 and then compare r4?  On more modern machines it
is faster to compare r0 itself, and it would allow shrink-wrapping to work
fine here (well, need to move the assignment to r4 down to the block where
it is used, but something will certainly do that, and it is one of the
shrink-wrapping improvements I want to do for GCC 10).

> It seems shrinkwrapping is more random, sometimes it's done as expected,
> sometimes it is not. It was more consistent on older GCC's.

Shrink-wrapping is very predictable.  But no block where a non-volatile
register is used or set will get shrink-wrapped.  This limitation has
existed since forever.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-05 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Richard Earnshaw  changed:

   What|Removed |Added

 CC||rearnsha at gcc dot gnu.org

--- Comment #10 from Richard Earnshaw  ---
I wonder if this could be picked up in the post-reload CSE pass?  (ie rewriting
the CBZ to use the incoming hard reg?)

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-04-05 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #9 from Richard Earnshaw  ---
(In reply to Wilco from comment #8)
> (In reply to Segher Boessenkool from comment #5)
> > The first one just needs an xfail.  I don't know if it should be *-*-* there
> > or only arm*-*-* should be added.
> > 
> > The other two need some debugging by someone who knows the target and/or
> > these tests.
> 
> The previous code for Arm was:
> 
>   cbz r0, .L5
>   push{r4, lr}
>   mov r4, r0
>   bl  foo
>   movwr2, #:lower16:.LANCHOR0
>   movtr2, #:upper16:.LANCHOR0
>   add r4, r4, r0
>   str r4, [r2]
>   pop {r4, pc}
> .L5:
>   movsr0, #1
>   bx  lr
> 
> Now it fails to shrinkwrap:
> 
>   push{r4, lr}
>   mov r4, r0
>   cmp r4, #0
>   moveq   r0, #1
>   beq .L3
>   bl  foo
>   ldr r2, .L7
>   add r3, r4, r0
>   str r3, [r2]
> .L3:
>   pop {r4, lr}
>   bx  lr
> 
> It seems shrinkwrapping is more random, sometimes it's done as expected,
> sometimes it is not. It was more consistent on older GCC's.

This looks like another fallout of not allowing combine to merge with hard
regs.  Previously the CBZ could be moved outside of the prologue because it
operated directly on the incoming hard reg.  Now it only sees the value after
the copy into the pseudo, which is a call-saved reg because it's live over the
call.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2019-02-06 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #8 from Wilco  ---
(In reply to Segher Boessenkool from comment #5)
> The first one just needs an xfail.  I don't know if it should be *-*-* there
> or only arm*-*-* should be added.
> 
> The other two need some debugging by someone who knows the target and/or
> these tests.

The previous code for Arm was:

cbz r0, .L5
push{r4, lr}
mov r4, r0
bl  foo
movwr2, #:lower16:.LANCHOR0
movtr2, #:upper16:.LANCHOR0
add r4, r4, r0
str r4, [r2]
pop {r4, pc}
.L5:
movsr0, #1
bx  lr

Now it fails to shrinkwrap:

push{r4, lr}
mov r4, r0
cmp r4, #0
moveq   r0, #1
beq .L3
bl  foo
ldr r2, .L7
add r3, r4, r0
str r3, [r2]
.L3:
pop {r4, lr}
bx  lr

It seems shrinkwrapping is more random, sometimes it's done as expected,
sometimes it is not. It was more consistent on older GCC's.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2018-12-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Priority|P3  |P1

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2018-12-14 Thread ramana at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Ramana Radhakrishnan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-12-14
 Ever confirmed|0   |1

--- Comment #7 from Ramana Radhakrishnan  ---
Confirmed.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2018-12-14 Thread ramana at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Ramana Radhakrishnan  changed:

   What|Removed |Added

 CC||ramana at gcc dot gnu.org

--- Comment #6 from Ramana Radhakrishnan  ---
(In reply to Segher Boessenkool from comment #5)
> The first one just needs an xfail.  I don't know if it should be *-*-* there
> or only arm*-*-* should be added.
> 
> The other two need some debugging by someone who knows the target and/or
> these tests.

for the addr-modes-float.c case there are additional vmov's being generated and
thus is certainly a regression. 

--- 8.s 2018-12-14 09:41:04.367843079 +
+++ addr-modes-float.s  2018-12-14 09:40:39.907980812 +
@@ -139,10 +139,13 @@
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
+   vmovq8, q0  @ ti
mov r3, r0
+   vmovq9, q1  @ ti
add r0, r0, #48
-   vst3.8  {d0, d2, d4}, [r3]!
-   vst3.8  {d1, d3, d5}, [r3]
+   vmovq10, q2  @ ti
+   vst3.8  {d16, d18, d20}, [r3]!
+   vst3.8  {d17, d19, d21}, [r3]

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2018-11-20 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #5 from Segher Boessenkool  ---
The first one just needs an xfail.  I don't know if it should be *-*-* there
or only arm*-*-* should be added.

The other two need some debugging by someone who knows the target and/or
these tests.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2018-11-20 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #4 from Christophe Lyon  ---
As of r266293, the following regressions reported here are still failing:
FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump pro_and_epilogue "Performing
shrink-wrapping"
FAIL: gcc.target/arm/addr-modes-float.c scan-assembler vst3.8\t{d[02468],
d[02468], d[02468]}, \\[r[0-9]+\\]!
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times strh\\tr[0-9]+
2
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vst1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2018-11-20 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #3 from Segher Boessenkool  ---
I don't know, this is up to the arm people.  I don't know if all problems
reported here are fixed now.

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2018-11-20 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org

--- Comment #2 from Martin Liška  ---
Segher: Can the bug be marked as resolved?

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2018-11-05 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #1 from Segher Boessenkool  ---
Author: segher
Date: Mon Nov  5 21:18:22 2018
New Revision: 265821

URL: https://gcc.gnu.org/viewcvs?rev=265821=gcc=rev
Log:
combine: Don't make an intermediate reg for assigning to sfp (PR87871)

The code with an intermediate register is perfectly fine, but LRA
apparently cannot handle the resulting code, or perhaps something else
is wrong.  In either case, making an extra temporary will not likely
help here, so let's just skip it.


PR rtl-optimization/87871
* combine.c (make_more_copies): Skip if dest is frame_pointer_rtx.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/combine.c

[Bug rtl-optimization/87871] [9 Regression] testcases fail after r265398 on arm

2018-11-05 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |9.0