Can you explain what some of the code is in the prof results you sent?

What is all the stuff from address 0xa to 0x47 doing?

What is 0xd5 to 0x147 doing? I'm guessing it's doing MathRound, but it
seems that can be done with just one or two instructions.  And the original
code was Math.floor(x + 0.5).  If MathRound is rounding to even, then that
is not what we want.

There are some various bits of code comparing ebx to small positive
constants Is that a bounds check on the kTrig array?

When I compare this disassembly with what gcc produces on the original
fdlibm code, gcc seems to be much smaller and simpler.  The actual
computation parts, however, appear roughly equal.  It's all the stuff
around it that makes V8 probably run slower than I would have expected.



On Thu, Jun 5, 2014 at 8:28 AM, Yang Guo <[email protected]> wrote:

> Here's a profile of the 64bit build. MathSinSlow takes most of the time,
> and the file includes a disassembly of the generated code, with each
> instruction annotated with profiling stats. Note that this runs an altered
> version of SunSpider's 3d-morph to run longer, giving more profiling
> samples.
>
> Yang
>
>
> On Thu, Jun 5, 2014 at 5:23 PM, <[email protected]> wrote:
>
>> On 2014/06/04 16:30:37, Raymond Toy wrote:
>>
>>> On 2014/06/04 07:19:29, Yang wrote:
>>> > On 2014/06/03 16:51:30, Raymond Toy wrote:
>>> > > On 2014/06/03 07:01:45, Yang wrote:
>>> > > > https://codereview.chromium.org/303753002/diff/40001/src/math.js
>>> > > > File src/math.js (right):
>>> > > >
>>> > > >
>>> https://codereview.chromium.org/303753002/diff/40001/src/
>>> math.js#newcode262
>>> > > > src/math.js:262: }
>>> > > > On 2014/06/02 17:26:11, Raymond Toy wrote:
>>> > > > > As you mentioned via email, you've removed the 3rd iteration.
>>> This is
>>> > really
>>> > > > > needed if you want to be able to reduce multiples of pi/2
>>> accurately.
>>> > > >
>>> > > > That's true. However, the reduction step is not exposed as a
>>> library
>>> > function.
>>> > > > From what I have seen, the third step seems to only affect y1.
>>> With a y0
>>> > > really
>>> > > > close to y1, it does not change the result of sine or cosine. This
>>> is
>>>
>> also
>>
>>> > why
>>> > > I
>>> > > > was asking for a test case where removing this third step would
>>> make a
>>> > > > difference.
>>> > >
>>> > > I don't understand what you mean by "y0 really close to y1".  What
>>> are you
>>> > > saying?
>>> > >
>>> > >
>>> > > tan(Math.PI*45/2) requires the 3rd iteration. ieee754_rem_pio2
>>> returns
>>> > > [45, -9.790984586812941e-16, -6.820314736619894e-32]
>>> > >
>>> > > If you ignore the y1 result, we have
>>> > > kernel_tan(-9.790984586812941e-16, 0e0, -1) -> 1021347742030824.2
>>> > >
>>> > > If you include the y1 result:
>>> > > kernel_tan(-9.790984586812941e-16,-6.820314736619894e-32, -1) ->
>>> > > 1021347742030824.1
>>> >
>>> > I somehow didn't type what I thought. I meant to say: if y0 is really
>>> close
>>>
>> to
>>
>>> > 0, there does not seem to be any point to invest in the third loop. (I
>>> am
>>> aware
>>> > that omitting y1 changes the result in some cases. I'm not arguing
>>> this).
>>> >
>>> > So in the example here, if I omit the third iteration, I get
>>> > [45, -9.790984586812941e-16, -6.820199415561299e-32]
>>> >
>>> > y0 is the same, y1 differs slightly, but the end result is still
>>> > 1021347742030824.1.
>>>
>>
>>  While I understand your desire to reduce the complexity, you are
>>> modifying an
>>> algorithm written by an expert.  I think the burden is on you to prove
>>> that by
>>> removing the third iteration you do not change the value of y0.
>>>
>>
>>  Also, where is this coming from?  In reality, how often will you compute
>>>
>> sin(x)
>>
>>> where x is very near a multiple of pi/2 (where the third iteration is
>>> needed)?
>>>
>>
>>  I suspect it occurs more often than we might expect, but also that if
>>> you're
>>> doing that, I think you're also computing zillions more values that are
>>> not a
>>> multiple of pi/2.
>>>
>>
>>  For example, in 3d-morph, we compute sin((n-1)*pi/15) for n = 0 to 119.
>>>  Thus
>>> out of 120 values, we have a multiple of pi just 8 times out of 120. If
>>> the
>>>
>> cost
>>
>>> of reduction for multiples of pi/2 AND the computation of sin were
>>> reduced to
>>> exactly zero, you would save about just 6.6% in runtime.
>>>
>>
>>  I think there are more important things to look at.  We need profile
>>> results.
>>>
>> We
>>
>>> need to understand what is really expensive in the reduction, not what we
>>>
>> think
>>
>>> is expensive.
>>>
>>
>> I added back the third iteration, and tweaked some places, so that the
>> runtime
>> is now down to 16ms (vs the current 12ms).
>>
>> https://codereview.chromium.org/303753002/
>>
>
>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to