[v8-dev] Re: ARM: locals and parameter access without unnecessary push and pop ([email protected]) (issue1340003)

subratokde Mon, 03 May 2010 17:05:59 -0700

Reviewers: Erik Corry, Mads Ager,

Message:
Yes, I verified with the latest v8 bleeding edge. This change is no longer
needed since the register allocator takes care of the issue.




Description:
I am illustrating the optimization with two simple examples. I found some
performance improvement on the benchmarks (v8-suite).

EXAMPLE WITH Binary Op Stubs
++++++++++++++++++++++++++++

  temp1 = x1 + x2;
  temp2 = temp3 - x3;


THIS CODE:
=========
0xf5aa9444   100  e59b0014       ldr r0, [fp, #+20]
0xf5aa9448   104  e52d0004       str r0, [sp, #-4]!
0xf5aa944c   108  e59b0010       ldr r0, [fp, #+16]
0xf5aa9450   112  e49d1004       ldr r1, [sp], #+4

0xf5aa9454 116 e59fc088 ldr ip, [pc, #+136] ;; debug:statement

208
                                                             ;; code: STUB,
GenericBinaryOp, minor: 144
0xf5aa9458   120  e12fff3c       blx ip
0xf5aa945c   124  e50b0010       str r0, [fp, #-16]
0xf5aa9460   128  e51b0018       ldr r0, [fp, #-24]
0xf5aa9464   132  e52d0004       str r0, [sp, #-4]!
0xf5aa9468   136  e59b000c       ldr r0, [fp, #+12]
0xf5aa946c   140  e49d1004       ldr r1, [sp], #+4

0xf5aa9470 144 e59fc070 ldr ip, [pc, #+112] ;; debug:statement

227
                                                             ;; code: STUB,
GenericBinaryOp, minor: 148
0xf5aa9474   148  e12fff3c       blx ip


WITH THIS OPTIMIZATION:
========================
0xf5ae1434    84  e59b1014       ldr r1, [fp, #+20]
0xf5ae1438    88  e59b0010       ldr r0, [fp, #+16]

0xf5ae143c 92 e59fc078 ldr ip, [pc, #+120] ;; debug:statement

208
                                                             ;; code: STUB,
GenericBinaryOp, minor: 144
0xf5ae1440    96  e12fff3c       blx ip
0xf5ae1444   100  e50b0010       str r0, [fp, #-16]
0xf5ae1448   104  e51b1018       ldr r1, [fp, #-24]
0xf5ae144c   108  e59b000c       ldr r0, [fp, #+12]

0xf5ae1450 112 e59fc068 ldr ip, [pc, #+104] ;; debug:statement

227
                                                             ;; code: STUB,
GenericBinaryOp, minor: 148
0xf5ae1454   116  e12fff3c       blx ip


Example with Compares:
++++++++++++++++++++++

  if(x1 > x2) temp4 = x1;
  else temp4 = x2;


THIS CODE:
=========
0xf5a473c0   160  e59b0014       ldr r0, [fp, #+20]
0xf5a473c4   164  e52d0004       str r0, [sp, #-4]!
0xf5a473c8   168  e59b0010       ldr r0, [fp, #+16]
0xf5a473cc   172  e52d0004       str r0, [sp, #-4]!
0xf5a473d0   176  e49d1004       ldr r1, [sp], #+4
0xf5a473d4   180  e49d0004       ldr r0, [sp], #+4
0xf5a473d8   184  e1802001       orr r2, r0, r1
0xf5a473dc   188  e3120001       tst r2, #1
0xf5a473e0   192  0a000003       beq +20 -> 212  (0xf5a473f4)

0xf5a473e4 196 e59fc068 ldr ip, [pc, #+104] ;; debug:statement

148
                                                             ;; code: STUB,
Compare, minor: 92
0xf5a473e8   200  e12fff3c       blx ip


WITH THIS OPTIMIZATION:
========================
0xf5afdc98   152  e59b0014       ldr r0, [fp, #+20]
0xf5afdc9c   156  e59b1010       ldr r1, [fp, #+16]
0xf5afdca0   160  e1802001       orr r2, r0, r1
0xf5afdca4   164  e3120001       tst r2, #1
0xf5afdca8   168  0a000003       beq +20 -> 188  (0xf5afdcbc)

0xf5afdcac 172 e59fc060 ldr ip, [pc, #+96] ;; debug:statement

148
                                                             ;; code: STUB,
Compare, minor: 44
0xf5afdcb0   176  e12fff3c       blx ip

EXPLANATION OF THE CHANGES:
===========================
This change is implemented at the source of the problem, i.e., where the

redundant push/pop instructions are inserted in the instruction stream,ratherthan as a post-pass push/pop elimination phase. The current push/popeliminationphase can only delete the push/pop's of one of the input operands for a2-inputoperation, also it can't eliminate when push/pop use different registers.Alsocan the post-pass push/pop elimination phase be functionally correct ifthere isa control flow merge point in between the push and the pop ? I am not surehow

that condition is checked, or if the design assumptions guarantees that such
situation never arises.

Since, the current implementation is based on the model where few registers(r0,

r1) are pre-assigned for operands to operations with 2 inputs (e.g.,
BinaryOperations) and LoadSlot uses r0 to load a local or parameter and then
push/pop before the actual operation, an initial check is done to see if the
second operand of a 2-input operation meets the criteria for successful
optimization. For a successful optimization, firstly the second operand

shouldn't expand into a sequence of instructions (i.e., it is not anexpression

tree), rather should be a single direct access of a local/parameter as
load(fp+offset).

This verification is implemented in a lightweight fashion are done by someprobefunctions. Also, during actual Visit traversals for generation operandaccess,

LoadAndSpill should go directly to the VisitSlot and LoadSlot through
VisitProxy, and if in between it goes to any other VisitXYZ() operations the

optimization is not done. So for the first operand a probing is not needed,butis needed and hence done only for the second operand before we actuallystart

the traversal through LoadAndSpill for the first operand.


Please review this at http://codereview.chromium.org/1340003/show

SVN Base: http://v8.googlecode.com/svn/branches/bleeding_edge/

Affected files:
  M     src/arm/codegen-arm.h
  M     src/arm/codegen-arm.cc
  M     src/arm/virtual-frame-arm.h
  M     src/arm/virtual-frame-arm.cc


--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

[v8-dev] Re: ARM: locals and parameter access without unnecessary push and pop ([email protected]) (issue1340003)

Reply via email to