Re: [v8-dev] Re: Trigonometric functions using fdlibm. (issue 303753002)

Yang Guo Thu, 12 Jun 2014 00:37:31 -0700

On Wed, Jun 11, 2014 at 12:41 AM, Raymond Toy <[email protected]> wrote:


> But what is the function call for?
>

The function call is the slow case for when double to integer (truncated)
conversion using cvttsd2si overflows.


> And for the record, I'm attaching the new profile results.
>
>
>
>
> On Mon, Jun 9, 2014 at 11:57 PM, Yang Guo <[email protected]> wrote:
>
>> Yep. That piece of code does a double to integer conversion.
>>
>>
>> On Tue, Jun 10, 2014 at 12:08 AM, <[email protected]> wrote:
>>
>>> (Yang sent me the new profile result in private email)
>>>
>>> This looks much better.  The only question I have now is what the code
>>> in 0xed to 0x105 is doing. Something related to converting to a float to an
>>> integer; perhaps boxing the result?
>>>
>>> Otherwise, it looks roughly like what gcc does, with a few extra moves
>>> and the bounds checks for kTrig.
>>>
>>>
>>> On Friday, June 6, 2014 9:28:53 AM UTC-7, Yang Guo wrote:
>>>
>>>> Argh. I even prepared it, but totally forgot to send it to you. Will do
>>>> when I get home.
>>>>
>>>> Yang
>>>> On Jun 6, 2014 6:03 PM, "Raymond Toy" <[email protected]> wrote:
>>>>
>>>>>  Thanks for clarifying these results and for providing the modified
>>>>> 3d-morph.
>>>>>
>>>>> When you get a chance could you provide new profile results with
>>>>> MathRound removed? And can you provide the pref results with the event
>>>>> counters enabled so we can see cache effects?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> On Fri, Jun 6, 2014 at 6:56 AM, Yang Guo <[email protected]> wrote:
>>>>>
>>>>>> Hi Raymond,
>>>>>>
>>>>>> the modified 3d-morph is attached.
>>>>>>
>>>>>> The code from 0xa to 0x47 are a stack check (at the entry to function
>>>>>> to detect stack overflow) and unboxing the argument into a double 
>>>>>> register
>>>>>> (double numbers are usually boxed in V8 and stored on the heap, except 
>>>>>> for
>>>>>> certain kinds of arrays and in optimized code).
>>>>>>
>>>>>> The code from 0xd5 to 0x147 is indeed a MathRound. Replacing it with
>>>>>> a floor (updated CL) actually gives a slight boost. The modified 3d-morph
>>>>>> goes from 8250ms to 8050ms, and the unmodified one now alternates between
>>>>>> 15ms and 16ms.
>>>>>>
>>>>>> Yes, those comparisons are bounds checks. Unfortunately, out-of-bound
>>>>>> reads on typed arrays in Javascript should return undefined. We already
>>>>>> eliminate some of the redundant bounds checks, but not all can be
>>>>>> eliminated. Of course the generated code for Javascript is a lot larger
>>>>>> than that for C, no surprise there. Javascript is a dynamic language 
>>>>>> after
>>>>>> all. And are right in that we probably should focus on the things that 
>>>>>> add
>>>>>> overhead.
>>>>>>
>>>>>> Moving the calculation to C wouldn't make things faster though, since
>>>>>> the switch to C code is rather expensive, and C code cannot be inlined
>>>>>> either.
>>>>>>
>>>>>> Yang
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 6, 2014 at 12:52 AM, Raymond Toy <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Can you explain what some of the code is in the prof results you
>>>>>>> sent?
>>>>>>>
>>>>>>> What is all the stuff from address 0xa to 0x47 doing?
>>>>>>>
>>>>>>> What is 0xd5 to 0x147 doing? I'm guessing it's doing MathRound, but
>>>>>>> it seems that can be done with just one or two instructions.  And the
>>>>>>> original code was Math.floor(x + 0.5).  If MathRound is rounding to 
>>>>>>> even,
>>>>>>> then that is not what we want.
>>>>>>>
>>>>>>> There are some various bits of code comparing ebx to small positive
>>>>>>> constants Is that a bounds check on the kTrig array?
>>>>>>>
>>>>>>> When I compare this disassembly with what gcc produces on the
>>>>>>> original fdlibm code, gcc seems to be much smaller and simpler.  The 
>>>>>>> actual
>>>>>>> computation parts, however, appear roughly equal.  It's all the stuff
>>>>>>> around it that makes V8 probably run slower than I would have expected.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 5, 2014 at 8:28 AM, Yang Guo <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Here's a profile of the 64bit build. MathSinSlow takes most of the
>>>>>>>> time, and the file includes a disassembly of the generated code, with 
>>>>>>>> each
>>>>>>>> instruction annotated with profiling stats. Note that this runs an 
>>>>>>>> altered
>>>>>>>> version of SunSpider's 3d-morph to run longer, giving more profiling
>>>>>>>> samples.
>>>>>>>>
>>>>>>>> Yang
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 5, 2014 at 5:23 PM, <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> On 2014/06/04 16:30:37, Raymond Toy wrote:
>>>>>>>>>
>>>>>>>>>> On 2014/06/04 07:19:29, Yang wrote:
>>>>>>>>>> > On 2014/06/03 16:51:30, Raymond Toy wrote:
>>>>>>>>>> > > On 2014/06/03 07:01:45, Yang wrote:
>>>>>>>>>> > > > https://codereview.chromium.org/303753002/diff/40001/src/
>>>>>>>>>> math.js
>>>>>>>>>> > > > File src/math.js (right):
>>>>>>>>>> > > >
>>>>>>>>>> > > >
>>>>>>>>>> https://codereview.chromium.org/303753002/diff/40001/src/mat
>>>>>>>>>> h.js#newcode262
>>>>>>>>>> > > > src/math.js:262: }
>>>>>>>>>> > > > On 2014/06/02 17:26:11, Raymond Toy wrote:
>>>>>>>>>> > > > > As you mentioned via email, you've removed the 3rd
>>>>>>>>>> iteration. This is
>>>>>>>>>> > really
>>>>>>>>>> > > > > needed if you want to be able to reduce multiples of pi/2
>>>>>>>>>> accurately.
>>>>>>>>>> > > >
>>>>>>>>>> > > > That's true. However, the reduction step is not exposed as
>>>>>>>>>> a library
>>>>>>>>>> > function.
>>>>>>>>>> > > > From what I have seen, the third step seems to only affect
>>>>>>>>>> y1. With a y0
>>>>>>>>>> > > really
>>>>>>>>>> > > > close to y1, it does not change the result of sine or
>>>>>>>>>> cosine. This is
>>>>>>>>>>
>>>>>>>>> also
>>>>>>>>>
>>>>>>>>>> > why
>>>>>>>>>> > > I
>>>>>>>>>> > > > was asking for a test case where removing this third step
>>>>>>>>>> would make a
>>>>>>>>>> > > > difference.
>>>>>>>>>> > >
>>>>>>>>>> > > I don't understand what you mean by "y0 really close to y1".
>>>>>>>>>>  What are you
>>>>>>>>>> > > saying?
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > tan(Math.PI*45/2) requires the 3rd iteration.
>>>>>>>>>> ieee754_rem_pio2 returns
>>>>>>>>>> > > [45, -9.790984586812941e-16, -6.820314736619894e-32]
>>>>>>>>>> > >
>>>>>>>>>> > > If you ignore the y1 result, we have
>>>>>>>>>> > > kernel_tan(-9.790984586812941e-16, 0e0, -1) ->
>>>>>>>>>> 1021347742030824.2
>>>>>>>>>> > >
>>>>>>>>>> > > If you include the y1 result:
>>>>>>>>>> > > kernel_tan(-9.790984586812941e-16,-6.820314736619894e-32,
>>>>>>>>>> -1) ->
>>>>>>>>>> > > 1021347742030824.1
>>>>>>>>>> >
>>>>>>>>>> > I somehow didn't type what I thought. I meant to say: if y0 is
>>>>>>>>>> really close
>>>>>>>>>>
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>>> > 0, there does not seem to be any point to invest in the third
>>>>>>>>>> loop. (I am
>>>>>>>>>> aware
>>>>>>>>>> > that omitting y1 changes the result in some cases. I'm not
>>>>>>>>>> arguing this).
>>>>>>>>>> >
>>>>>>>>>> > So in the example here, if I omit the third iteration, I get
>>>>>>>>>> > [45, -9.790984586812941e-16, -6.820199415561299e-32]
>>>>>>>>>> >
>>>>>>>>>> > y0 is the same, y1 differs slightly, but the end result is still
>>>>>>>>>> > 1021347742030824.1.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  While I understand your desire to reduce the complexity, you are
>>>>>>>>>> modifying an
>>>>>>>>>> algorithm written by an expert.  I think the burden is on you to
>>>>>>>>>> prove that by
>>>>>>>>>> removing the third iteration you do not change the value of y0.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  Also, where is this coming from?  In reality, how often will you
>>>>>>>>>> compute
>>>>>>>>>>
>>>>>>>>> sin(x)
>>>>>>>>>
>>>>>>>>>> where x is very near a multiple of pi/2 (where the third
>>>>>>>>>> iteration is needed)?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  I suspect it occurs more often than we might expect, but also
>>>>>>>>>> that if you're
>>>>>>>>>> doing that, I think you're also computing zillions more values
>>>>>>>>>> that are not a
>>>>>>>>>> multiple of pi/2.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  For example, in 3d-morph, we compute sin((n-1)*pi/15) for n = 0
>>>>>>>>>> to 119.  Thus
>>>>>>>>>> out of 120 values, we have a multiple of pi just 8 times out of
>>>>>>>>>> 120. If the
>>>>>>>>>>
>>>>>>>>> cost
>>>>>>>>>
>>>>>>>>>> of reduction for multiples of pi/2 AND the computation of sin
>>>>>>>>>> were reduced to
>>>>>>>>>> exactly zero, you would save about just 6.6% in runtime.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  I think there are more important things to look at.  We need
>>>>>>>>>> profile results.
>>>>>>>>>>
>>>>>>>>> We
>>>>>>>>>
>>>>>>>>>> need to understand what is really expensive in the reduction, not
>>>>>>>>>> what we
>>>>>>>>>>
>>>>>>>>> think
>>>>>>>>>
>>>>>>>>>> is expensive.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I added back the third iteration, and tweaked some places, so that
>>>>>>>>> the runtime
>>>>>>>>> is now down to 16ms (vs the current 12ms).
>>>>>>>>>
>>>>>>>>> https://codereview.chromium.org/303753002/
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>   --
>>> --
>>> v8-dev mailing list
>>> [email protected]
>>> http://groups.google.com/group/v8-dev
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "v8-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> --
>> v8-dev mailing list
>> [email protected]
>> http://groups.google.com/group/v8-dev
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "v8-dev" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/v8-dev/Y6u4aNdP2Xs/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> --
> v8-dev mailing list
> [email protected]
> http://groups.google.com/group/v8-dev
> ---
> You received this message because you are subscribed to the Google Groups
> "v8-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [v8-dev] Re: Trigonometric functions using fdlibm. (issue 303753002)

Reply via email to