[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly

2015-11-25 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868

Segher Boessenkool  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Segher Boessenkool  ---
As Andrew notes in comment #3, "addic." is not microcoded on Cell BE.
I fixed this misclassification about a year ago (it used to be type
"compare", now is "add").

Current trunk also does not do a load/store; all is good now.

[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly

2015-11-22 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||segher at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org

--- Comment #10 from Segher Boessenkool  ---
We no longer generate addic. for this testcase, but that is an accident
(combine first makes dec+cmp into an addic., but then also combines it
with the conditional branch into a bdnz pattern; this needs splitting
later, and since r218591 we no longer split to addic.).

*add3_imm_{dot,dot2} should have rs6000_gen_cell_microcode in
the condition.  Mine.

[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly

2011-11-29 Thread pinskia at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868

Andrew Pinski pinskia at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
 AssignedTo|pinskia at gcc dot gnu.org  |unassigned at gcc dot
   ||gnu.org

--- Comment #9 from Andrew Pinski pinskia at gcc dot gnu.org 2011-11-29 
23:18:46 UTC ---
No longer working on this.


[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly

2011-11-29 Thread pinskia at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868

--- Comment #8 from Andrew Pinski pinskia at gcc dot gnu.org 2011-11-29 
23:18:32 UTC ---
No longer working on this.


[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly

2009-11-03 Thread siarhei dot siamashka at gmail dot com


--- Comment #7 from siarhei dot siamashka at gmail dot com  2009-11-03 
20:09 ---
Thanks a lot for checking this. And sorry about the confusion caused by
attributing slowness of the testcase to the microcoded stuff (which turned out
to be not the case) without proper checking this first.

So should this bug be split into two? One about the incorrect warning, and
another one about generating nonoptimal code at -O2 level (extra load and store
operations, which are probably penalized by something like RAW hazard in such a
short loop)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868



[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly

2009-11-02 Thread pinskia at gcc dot gnu dot org


--- Comment #2 from pinskia at gcc dot gnu dot org  2009-11-02 16:51 ---
Simple patch which I am testing right now:
Index: gcc/gcc/config/rs6000/rs6000.md
===
--- gcc/gcc/config/rs6000/rs6000.md (revision 153680)
+++ gcc/gcc/config/rs6000/rs6000.md (working copy)
@@ -1627,7 +1627,7 @@ (define_insn *addmode3_internal3
(set_attr length 4,4,8,8)])

 (define_split
-  [(set (match_operand:CC 3 cc_reg_not_cr0_operand )
+  [(set (match_operand:CC 3 cc_reg_not_micro_cr0_operand )
(compare:CC (plus:P (match_operand:P 1 gpc_reg_operand )
(match_operand:P 2 reg_or_short_operand ))


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |pinskia at gcc dot gnu dot
   |dot org |org
 Status|UNCONFIRMED |ASSIGNED
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2009-11-02 16:51:40
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868



[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly

2009-11-02 Thread pinskia at gcc dot gnu dot org


--- Comment #3 from pinskia at gcc dot gnu dot org  2009-11-02 16:56 ---
Actually the warning is incorrect at least according to the PPU book 4.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868



[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly

2009-11-02 Thread pinskia at gcc dot gnu dot org


--- Comment #4 from pinskia at gcc dot gnu dot org  2009-11-02 17:05 ---
In fact changing the the addic. into addic/cmpwi does not improve the speed of
the code:


With the change:
[apin...@dhcp-10-98-10-216 local]$ time ./a.out
56.316u 0.084s 0:57.09 98.7%0+0k 0+0io 0pf+0w

Without:
56.276u 0.088s 0:57.08 98.7%0+0k 0+0io 0pf+0w


So the warning is only invalid.  

With -Os on the trunk:
24.144u 0.032s 0:24.45 98.8%0+0k 0+0io 0pf+0w


I don't know why off hand -Os is faster than -O2.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868



[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly

2009-11-02 Thread pinskia at gcc dot gnu dot org


--- Comment #5 from pinskia at gcc dot gnu dot org  2009-11-02 17:08 ---
In fact doing the following diff to the -Os assembly:
--- t5.Os.s 2009-11-02 23:18:52.0 +0900
+++ t5.Os.dot.s 2009-11-02 23:20:19.0 +0900
@@ -29,9 +29,9 @@ x:
 .L4:
bl y
 .L3:
-   cmpwi 7,31,0
-   addi 31,31,-1
-   bne 7,.L4
+#  cmpwi 7,31,0
+   addic. 31,31,-1
+   bne .L4
addi 11,1,16
b _restgpr_31_x
.size   x,.-x

produces the same result as -Os on the trunk.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868



[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly

2009-11-02 Thread pinskia at gcc dot gnu dot org


--- Comment #6 from pinskia at gcc dot gnu dot org  2009-11-02 17:10 ---
So in conclusion, addic. is not microcoded and the warning is incorrect but
still -Os is faster than -O2.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868



[Bug target/41868] cell microcode instruction (addic.) is generated for a trivial loop with -O2 optimizations, hurting performance badly

2009-10-29 Thread siarhei dot siamashka at gmail dot com


--- Comment #1 from siarhei dot siamashka at gmail dot com  2009-10-29 
15:21 ---
-O2:

0010 .x:
  10:   2c 23 00 00 cmpdi   r3,0
  14:   7c 08 02 a6 mflrr0
  18:   f8 01 00 10 std r0,16(r1)
  1c:   f8 21 ff 81 stdur1,-128(r1)
  20:   41 82 00 1c beq-3c .x+0x2c
  24:   f8 61 00 70 std r3,112(r1)
  28:   48 00 00 01 bl  28 .x+0x18
  2c:   e8 01 00 70 ld  r0,112(r1)
  30:   35 20 ff ff addic.  r9,r0,-1
  34:   f9 21 00 70 std r9,112(r1)
  38:   40 82 ff f0 bne+28 .x+0x18
  3c:   38 21 00 80 addir1,r1,128
  40:   e8 01 00 10 ld  r0,16(r1)
  44:   7c 08 03 a6 mtlrr0
  48:   4e 80 00 20 blr
  4c:   00 00 00 00 .long 0x0
  50:   00 00 00 01 .long 0x1
  54:   80 00 00 00 lwz r0,0(0)


-Os:

0010 .x:
  10:   fb e1 ff f8 std r31,-8(r1)
  14:   7c 08 02 a6 mflrr0
  18:   f8 01 00 10 std r0,16(r1)
  1c:   7c 7f 1b 78 mr  r31,r3
  20:   f8 21 ff 81 stdur1,-128(r1)
  24:   48 00 00 08 b   2c .x+0x1c
  28:   48 00 00 01 bl  28 .x+0x18
  2c:   2f bf 00 00 cmpdi   cr7,r31,0
  30:   3b ff ff ff addir31,r31,-1
  34:   40 9e ff f4 bne+cr7,28 .x+0x18
  38:   38 21 00 80 addir1,r1,128
  3c:   e8 01 00 10 ld  r0,16(r1)
  40:   eb e1 ff f8 ld  r31,-8(r1)
  44:   7c 08 03 a6 mtlrr0
  48:   4e 80 00 20 blr
  4c:   00 00 00 00 .long 0x0
  50:   00 00 00 01 .long 0x1
  54:   80 01 00 00 lwz r0,0(r1)


-- 

siarhei dot siamashka at gmail dot com changed:

   What|Removed |Added

 CC||siarhei dot siamashka at
   ||gmail dot com
   Keywords||missed-optimization
Summary|cell microcode instruction  |cell microcode instruction
   |is generated for a trivial  |(addic.) is generated for a
   |loop with -O2 optimizations,|trivial loop with -O2
   |hurting performance badly   |optimizations, hurting
   ||performance badly


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41868