[v8-dev] Re: Trigonometric functions using fdlibm. (issue 303753002)

rtoy Mon, 09 Jun 2014 15:09:11 -0700

(Yang sent me the new profile result in private email)

This looks much better.  The only question I have now is what the code in 
0xed to 0x105 is doing. Something related to converting to a float to an 
integer; perhaps boxing the result?


Otherwise, it looks roughly like what gcc does, with a few extra moves and 
the bounds checks for kTrig.

On Friday, June 6, 2014 9:28:53 AM UTC-7, Yang Guo wrote:
>
> Argh. I even prepared it, but totally forgot to send it to you. Will do 
> when I get home.
>
> Yang
> On Jun 6, 2014 6:03 PM, "Raymond Toy" <[email protected] <javascript:>> 
> wrote:
>
>> Thanks for clarifying these results and for providing the modified 
>> 3d-morph.
>>
>> When you get a chance could you provide new profile results with 
>> MathRound removed? And can you provide the pref results with the event 
>> counters enabled so we can see cache effects?
>>
>> Thanks!
>>
>>
>> On Fri, Jun 6, 2014 at 6:56 AM, Yang Guo <[email protected] 
>> <javascript:>> wrote:
>>
>>> Hi Raymond,
>>>
>>> the modified 3d-morph is attached.
>>>
>>> The code from 0xa to 0x47 are a stack check (at the entry to function to 
>>> detect stack overflow) and unboxing the argument into a double register 
>>> (double numbers are usually boxed in V8 and stored on the heap, except for 
>>> certain kinds of arrays and in optimized code).
>>>
>>> The code from 0xd5 to 0x147 is indeed a MathRound. Replacing it with a 
>>> floor (updated CL) actually gives a slight boost. The modified 3d-morph 
>>> goes from 8250ms to 8050ms, and the unmodified one now alternates between 
>>> 15ms and 16ms.
>>>
>>> Yes, those comparisons are bounds checks. Unfortunately, out-of-bound 
>>> reads on typed arrays in Javascript should return undefined. We already 
>>> eliminate some of the redundant bounds checks, but not all can be 
>>> eliminated. Of course the generated code for Javascript is a lot larger 
>>> than that for C, no surprise there. Javascript is a dynamic language after 
>>> all. And are right in that we probably should focus on the things that add 
>>> overhead.
>>>
>>> Moving the calculation to C wouldn't make things faster though, since 
>>> the switch to C code is rather expensive, and C code cannot be inlined 
>>> either.
>>>
>>> Yang
>>>
>>>
>>>
>>> On Fri, Jun 6, 2014 at 12:52 AM, Raymond Toy <[email protected] 
>>> <javascript:>> wrote:
>>>
>>>> Can you explain what some of the code is in the prof results you sent?
>>>>
>>>> What is all the stuff from address 0xa to 0x47 doing?
>>>>
>>>> What is 0xd5 to 0x147 doing? I'm guessing it's doing MathRound, but it 
>>>> seems that can be done with just one or two instructions.  And the 
>>>> original 
>>>> code was Math.floor(x + 0.5).  If MathRound is rounding to even, then that 
>>>> is not what we want.
>>>>
>>>> There are some various bits of code comparing ebx to small positive 
>>>> constants Is that a bounds check on the kTrig array?
>>>>
>>>> When I compare this disassembly with what gcc produces on the original 
>>>> fdlibm code, gcc seems to be much smaller and simpler.  The actual 
>>>> computation parts, however, appear roughly equal.  It's all the stuff 
>>>> around it that makes V8 probably run slower than I would have expected.
>>>>
>>>>
>>>>
>>>> On Thu, Jun 5, 2014 at 8:28 AM, Yang Guo <[email protected] 
>>>> <javascript:>> wrote:
>>>>
>>>>> Here's a profile of the 64bit build. MathSinSlow takes most of the 
>>>>> time, and the file includes a disassembly of the generated code, with 
>>>>> each 
>>>>> instruction annotated with profiling stats. Note that this runs an 
>>>>> altered 
>>>>> version of SunSpider's 3d-morph to run longer, giving more profiling 
>>>>> samples.
>>>>>
>>>>> Yang
>>>>>
>>>>>
>>>>> On Thu, Jun 5, 2014 at 5:23 PM, <[email protected] <javascript:>> 
>>>>> wrote:
>>>>>
>>>>>> On 2014/06/04 16:30:37, Raymond Toy wrote:
>>>>>>
>>>>>>> On 2014/06/04 07:19:29, Yang wrote:
>>>>>>> > On 2014/06/03 16:51:30, Raymond Toy wrote:
>>>>>>> > > On 2014/06/03 07:01:45, Yang wrote:
>>>>>>> > > > https://codereview.chromium.org/303753002/diff/40001/src/
>>>>>>> math.js
>>>>>>> > > > File src/math.js (right):
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> https://codereview.chromium.org/303753002/diff/40001/src/
>>>>>>> math.js#newcode262
>>>>>>> > > > src/math.js:262: }
>>>>>>> > > > On 2014/06/02 17:26:11, Raymond Toy wrote:
>>>>>>> > > > > As you mentioned via email, you've removed the 3rd 
>>>>>>> iteration. This is
>>>>>>> > really
>>>>>>> > > > > needed if you want to be able to reduce multiples of pi/2 
>>>>>>> accurately.
>>>>>>> > > >
>>>>>>> > > > That's true. However, the reduction step is not exposed as a 
>>>>>>> library
>>>>>>> > function.
>>>>>>> > > > From what I have seen, the third step seems to only affect y1. 
>>>>>>> With a y0
>>>>>>> > > really
>>>>>>> > > > close to y1, it does not change the result of sine or cosine. 
>>>>>>> This is
>>>>>>>
>>>>>> also
>>>>>>
>>>>>>> > why
>>>>>>> > > I
>>>>>>> > > > was asking for a test case where removing this third step 
>>>>>>> would make a
>>>>>>> > > > difference.
>>>>>>> > >
>>>>>>> > > I don't understand what you mean by "y0 really close to y1". 
>>>>>>>  What are you
>>>>>>> > > saying?
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > tan(Math.PI*45/2) requires the 3rd iteration. ieee754_rem_pio2 
>>>>>>> returns
>>>>>>> > > [45, -9.790984586812941e-16, -6.820314736619894e-32]
>>>>>>> > >
>>>>>>> > > If you ignore the y1 result, we have
>>>>>>> > > kernel_tan(-9.790984586812941e-16, 0e0, -1) -> 
>>>>>>> 1021347742030824.2
>>>>>>> > >
>>>>>>> > > If you include the y1 result:
>>>>>>> > > kernel_tan(-9.790984586812941e-16,-6.820314736619894e-32, -1) ->
>>>>>>> > > 1021347742030824.1
>>>>>>> >
>>>>>>> > I somehow didn't type what I thought. I meant to say: if y0 is 
>>>>>>> really close
>>>>>>>
>>>>>> to
>>>>>>
>>>>>>> > 0, there does not seem to be any point to invest in the third 
>>>>>>> loop. (I am
>>>>>>> aware
>>>>>>> > that omitting y1 changes the result in some cases. I'm not arguing 
>>>>>>> this).
>>>>>>> >
>>>>>>> > So in the example here, if I omit the third iteration, I get
>>>>>>> > [45, -9.790984586812941e-16, -6.820199415561299e-32]
>>>>>>> >
>>>>>>> > y0 is the same, y1 differs slightly, but the end result is still
>>>>>>> > 1021347742030824.1.
>>>>>>>
>>>>>>
>>>>>>  While I understand your desire to reduce the complexity, you are 
>>>>>>> modifying an
>>>>>>> algorithm written by an expert.  I think the burden is on you to 
>>>>>>> prove that by
>>>>>>> removing the third iteration you do not change the value of y0.
>>>>>>>
>>>>>>
>>>>>>  Also, where is this coming from?  In reality, how often will you 
>>>>>>> compute
>>>>>>>
>>>>>> sin(x)
>>>>>>
>>>>>>> where x is very near a multiple of pi/2 (where the third iteration 
>>>>>>> is needed)?
>>>>>>>
>>>>>>
>>>>>>  I suspect it occurs more often than we might expect, but also that 
>>>>>>> if you're
>>>>>>> doing that, I think you're also computing zillions more values that 
>>>>>>> are not a
>>>>>>> multiple of pi/2.
>>>>>>>
>>>>>>
>>>>>>  For example, in 3d-morph, we compute sin((n-1)*pi/15) for n = 0 to 
>>>>>>> 119.  Thus
>>>>>>> out of 120 values, we have a multiple of pi just 8 times out of 120. 
>>>>>>> If the
>>>>>>>
>>>>>> cost
>>>>>>
>>>>>>> of reduction for multiples of pi/2 AND the computation of sin were 
>>>>>>> reduced to
>>>>>>> exactly zero, you would save about just 6.6% in runtime.
>>>>>>>
>>>>>>
>>>>>>  I think there are more important things to look at.  We need profile 
>>>>>>> results.
>>>>>>>
>>>>>> We
>>>>>>
>>>>>>> need to understand what is really expensive in the reduction, not 
>>>>>>> what we
>>>>>>>
>>>>>> think
>>>>>>
>>>>>>> is expensive.
>>>>>>>
>>>>>>
>>>>>> I added back the third iteration, and tweaked some places, so that 
>>>>>> the runtime
>>>>>> is now down to 16ms (vs the current 12ms).
>>>>>>
>>>>>> https://codereview.chromium.org/303753002/
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>  

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[v8-dev] Re: Trigonometric functions using fdlibm. (issue 303753002)

Reply via email to