[v8-dev] Re: Added fast support for Math.pow. This simply calculates the result using the... (issue660072)

ricow Wed, 03 Mar 2010 22:39:35 -0800


http://codereview.chromium.org/660072/diff/1062/1063
File src/ia32/codegen-ia32.cc (right):


http://codereview.chromium.org/660072/diff/1062/1063#newcode5345
src/ia32/codegen-ia32.cc:5345: __ test(y.reg(), Immediate(1));
On 2010/02/26 09:06:20, Lasse Reichstein wrote:

If you shr(y.reg(),1) here, then the bit that was shifted out is in

the carry

flag. I.e.:
shr(y.reg(), 1);
j(not_carry, &no_multiply);
mulsd(xmm1, xmm0);
bind(&no_multiply);


Done.

http://codereview.chromium.org/660072/diff/1062/1063#newcode5351
src/ia32/codegen-ia32.cc:5351: __ j(zero, &powi_done);
On 2010/02/26 09:06:20, Lasse Reichstein wrote:

Just do:
  mulsd(xmm0, xmm0);
  j(not_zero, &while_true);  // mulsd doesn't affect flags.

It does one more mulsd than necessary, but on, e.g., a core2 chip,

that's just

one cycle, and it saves an instruction on each iteration of the loop.

This and he previous comment should reduce the loop to four

operations. That's

probably only a space saving, since some of the multiplications are

dependent

and have a five cycle latency (and probably want to use the same ALU).


Done.

http://codereview.chromium.org/660072/diff/1062/1063#newcode5360
src/ia32/codegen-ia32.cc:5360: __ mov(p.reg(), Immediate(0x7FF00000));
On 2010/02/26 09:06:20, Lasse Reichstein wrote:

I don't think this is single-precission infinity.
Single precission floats only have eight exponent bits.
Positive infinity would be 0x7f800000.
(And 0x7ff00000 is a quiet NaN with payload).

Do we need a better test for this case? :)

As discussed offline this will always branch to runtime because of the
way comisd works, I have changed it so that it only goes to runtime if
it is actually infinity.
We already have tests for this (but they where not hit since runtime
executes correctly and we always went there if y is an negative
integer).

http://codereview.chromium.org/660072/diff/1062/1063#newcode5363
src/ia32/codegen-ia32.cc:5363: __ comisd(xmm0, xmm1);
On 2010/02/26 09:06:20, Lasse Reichstein wrote:

Use ucomisd instead of comisd.



Done - added ucomisd to assembler-ia32

http://codereview.chromium.org/660072/diff/1062/1063#newcode5376
src/ia32/codegen-ia32.cc:5376: __ comisd(xmm1, xmm1);
On 2010/02/26 09:06:20, Lasse Reichstein wrote:

ucomisd


Done.

http://codereview.chromium.org/660072/diff/1062/1063#newcode5395
src/ia32/codegen-ia32.cc:5395: __ cmp(Operand(p.reg()),
Immediate(HeapNumber::kExponentMask));
On 2010/02/26 09:06:20, Lasse Reichstein wrote:

Add comment here saying something like "p is NaN or +/-Infinity"


Done.

http://codereview.chromium.org/660072/diff/1062/1063#newcode5409
src/ia32/codegen-ia32.cc:5409: __ comisd(xmm2, xmm1);
On 2010/02/26 09:06:20, Lasse Reichstein wrote:

ucomisd



Done.

http://codereview.chromium.org/660072/diff/1062/1063#newcode5413
src/ia32/codegen-ia32.cc:5413: __ sqrt(xmm0, xmm0);
On 2010/02/26 09:06:20, Lasse Reichstein wrote:

Why isn't sqrt called sqrtsd?

Changed name and moved down to sse2 instructions.

http://codereview.chromium.org/660072/diff/1062/1063#newcode5415
src/ia32/codegen-ia32.cc:5415: __ movsd(xmm1, xmm3);
On 2010/02/26 09:06:20, Lasse Reichstein wrote:

Move sqrt to the end (i.e., sqrt(xmm1,xmm1)) since it'shas a higher

latency than

movsd.
(And perhaps comment that 1/sqrt(x) = sqrt(1/x)).



Done.

http://codereview.chromium.org/660072/diff/1062/1063#newcode5421
src/ia32/codegen-ia32.cc:5421: __ mov(p.reg(), Immediate(0x3F000000));
On 2010/02/26 09:06:20, Lasse Reichstein wrote:

You have -0.5 in xmm2 already, and 1.0 in xmm3, so just do
addsd(xmm2, xmm3) to get +0.5.



Done.

http://codereview.chromium.org/660072/diff/1062/1063#newcode5425
src/ia32/codegen-ia32.cc:5425: __ comisd(xmm2, xmm1);
On 2010/02/26 09:06:20, Lasse Reichstein wrote:

ucomisd



Done.

http://codereview.chromium.org/660072/diff/1062/1063#newcode5428
src/ia32/codegen-ia32.cc:5428: __ sqrt(xmm0, xmm0);
On 2010/02/26 09:06:20, Lasse Reichstein wrote:

do
  movsd(xmm1, xmm0);
  sqrt(xmm1, xmm1);
instead, to allow the latency of the sqrt operation to be spent during

the heap

allocation code, instead of stalling on the move.


Done.

http://codereview.chromium.org/660072

--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

[v8-dev] Re: Added fast support for Math.pow. This simply calculates the result using the... (issue660072)

Reply via email to