Re: [v8-dev] Re: Trigonometric functions using fdlibm. (issue 303753002)

Yang Guo Wed, 18 Jun 2014 02:33:08 -0700

Status update:
I haven't had much time recently, and there are still somethings I want to
try before cleaning up the code to get it in shape for landing.
I also discovered that it would tank Kraken's audio-oscillator, but the
reason for that is not the sin/cos performance, but for some reason the
compiler is generating sub-optimal code for the main hot function in that
benchmark. It seems timing-dependent. I need to find out what's happening
there before progressing here.


Yang


On Thu, Jun 12, 2014 at 6:28 PM, Raymond Toy <[email protected]> wrote:

>
>
>
> On Thu, Jun 12, 2014 at 12:37 AM, Yang Guo <[email protected]> wrote:
>
>>
>>
>> On Wed, Jun 11, 2014 at 12:41 AM, Raymond Toy <[email protected]> wrote:
>>
>>> But what is the function call for?
>>>
>>
>> The function call is the slow case for when double to integer (truncated)
>> conversion using cvttsd2si overflows.
>>
>
> Ah, ok.  If we changed the if statement at line 222 to say abs(x) <
> 1647099.9999, would the compiler be able to derive that n cannot
> overflow?  That be another opportunity to micro-optimize this routine.
> Actually some of the if's could be replaced with float comparisons instead
> of integer comparisons, if that makes a difference.
>
>
>
>>
>>> And for the record, I'm attaching the new profile results.
>>>
>>>
>>>
>>>
>>> On Mon, Jun 9, 2014 at 11:57 PM, Yang Guo <[email protected]> wrote:
>>>
>>>> Yep. That piece of code does a double to integer conversion.
>>>>
>>>>
>>>> On Tue, Jun 10, 2014 at 12:08 AM, <[email protected]> wrote:
>>>>
>>>>> (Yang sent me the new profile result in private email)
>>>>>
>>>>> This looks much better.  The only question I have now is what the code
>>>>> in 0xed to 0x105 is doing. Something related to converting to a float to 
>>>>> an
>>>>> integer; perhaps boxing the result?
>>>>>
>>>>> Otherwise, it looks roughly like what gcc does, with a few extra moves
>>>>> and the bounds checks for kTrig.
>>>>>
>>>>>
>>>>> On Friday, June 6, 2014 9:28:53 AM UTC-7, Yang Guo wrote:
>>>>>
>>>>>> Argh. I even prepared it, but totally forgot to send it to you. Will
>>>>>> do when I get home.
>>>>>>
>>>>>> Yang
>>>>>> On Jun 6, 2014 6:03 PM, "Raymond Toy" <[email protected]> wrote:
>>>>>>
>>>>>>>  Thanks for clarifying these results and for providing the modified
>>>>>>> 3d-morph.
>>>>>>>
>>>>>>> When you get a chance could you provide new profile results with
>>>>>>> MathRound removed? And can you provide the pref results with the event
>>>>>>> counters enabled so we can see cache effects?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 6, 2014 at 6:56 AM, Yang Guo <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Raymond,
>>>>>>>>
>>>>>>>> the modified 3d-morph is attached.
>>>>>>>>
>>>>>>>> The code from 0xa to 0x47 are a stack check (at the entry to
>>>>>>>> function to detect stack overflow) and unboxing the argument into a 
>>>>>>>> double
>>>>>>>> register (double numbers are usually boxed in V8 and stored on the 
>>>>>>>> heap,
>>>>>>>> except for certain kinds of arrays and in optimized code).
>>>>>>>>
>>>>>>>> The code from 0xd5 to 0x147 is indeed a MathRound. Replacing it
>>>>>>>> with a floor (updated CL) actually gives a slight boost. The modified
>>>>>>>> 3d-morph goes from 8250ms to 8050ms, and the unmodified one now 
>>>>>>>> alternates
>>>>>>>> between 15ms and 16ms.
>>>>>>>>
>>>>>>>> Yes, those comparisons are bounds checks. Unfortunately,
>>>>>>>> out-of-bound reads on typed arrays in Javascript should return 
>>>>>>>> undefined.
>>>>>>>> We already eliminate some of the redundant bounds checks, but not all 
>>>>>>>> can
>>>>>>>> be eliminated. Of course the generated code for Javascript is a lot 
>>>>>>>> larger
>>>>>>>> than that for C, no surprise there. Javascript is a dynamic language 
>>>>>>>> after
>>>>>>>> all. And are right in that we probably should focus on the things that 
>>>>>>>> add
>>>>>>>> overhead.
>>>>>>>>
>>>>>>>> Moving the calculation to C wouldn't make things faster though,
>>>>>>>> since the switch to C code is rather expensive, and C code cannot be
>>>>>>>> inlined either.
>>>>>>>>
>>>>>>>> Yang
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 6, 2014 at 12:52 AM, Raymond Toy <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Can you explain what some of the code is in the prof results you
>>>>>>>>> sent?
>>>>>>>>>
>>>>>>>>> What is all the stuff from address 0xa to 0x47 doing?
>>>>>>>>>
>>>>>>>>> What is 0xd5 to 0x147 doing? I'm guessing it's doing MathRound,
>>>>>>>>> but it seems that can be done with just one or two instructions.  And 
>>>>>>>>> the
>>>>>>>>> original code was Math.floor(x + 0.5).  If MathRound is rounding to 
>>>>>>>>> even,
>>>>>>>>> then that is not what we want.
>>>>>>>>>
>>>>>>>>> There are some various bits of code comparing ebx to small
>>>>>>>>> positive constants Is that a bounds check on the kTrig array?
>>>>>>>>>
>>>>>>>>> When I compare this disassembly with what gcc produces on the
>>>>>>>>> original fdlibm code, gcc seems to be much smaller and simpler.  The 
>>>>>>>>> actual
>>>>>>>>> computation parts, however, appear roughly equal.  It's all the stuff
>>>>>>>>> around it that makes V8 probably run slower than I would have 
>>>>>>>>> expected.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jun 5, 2014 at 8:28 AM, Yang Guo <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Here's a profile of the 64bit build. MathSinSlow takes most of
>>>>>>>>>> the time, and the file includes a disassembly of the generated code, 
>>>>>>>>>> with
>>>>>>>>>> each instruction annotated with profiling stats. Note that this runs 
>>>>>>>>>> an
>>>>>>>>>> altered version of SunSpider's 3d-morph to run longer, giving more
>>>>>>>>>> profiling samples.
>>>>>>>>>>
>>>>>>>>>> Yang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 5, 2014 at 5:23 PM, <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> On 2014/06/04 16:30:37, Raymond Toy wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On 2014/06/04 07:19:29, Yang wrote:
>>>>>>>>>>>> > On 2014/06/03 16:51:30, Raymond Toy wrote:
>>>>>>>>>>>> > > On 2014/06/03 07:01:45, Yang wrote:
>>>>>>>>>>>> > > > https://codereview.chromium.org/303753002/diff/40001/src/
>>>>>>>>>>>> math.js
>>>>>>>>>>>> > > > File src/math.js (right):
>>>>>>>>>>>> > > >
>>>>>>>>>>>> > > >
>>>>>>>>>>>> https://codereview.chromium.org/303753002/diff/40001/src/mat
>>>>>>>>>>>> h.js#newcode262
>>>>>>>>>>>> > > > src/math.js:262: }
>>>>>>>>>>>> > > > On 2014/06/02 17:26:11, Raymond Toy wrote:
>>>>>>>>>>>> > > > > As you mentioned via email, you've removed the 3rd
>>>>>>>>>>>> iteration. This is
>>>>>>>>>>>> > really
>>>>>>>>>>>> > > > > needed if you want to be able to reduce multiples of
>>>>>>>>>>>> pi/2 accurately.
>>>>>>>>>>>> > > >
>>>>>>>>>>>> > > > That's true. However, the reduction step is not exposed
>>>>>>>>>>>> as a library
>>>>>>>>>>>> > function.
>>>>>>>>>>>> > > > From what I have seen, the third step seems to only
>>>>>>>>>>>> affect y1. With a y0
>>>>>>>>>>>> > > really
>>>>>>>>>>>> > > > close to y1, it does not change the result of sine or
>>>>>>>>>>>> cosine. This is
>>>>>>>>>>>>
>>>>>>>>>>> also
>>>>>>>>>>>
>>>>>>>>>>>> > why
>>>>>>>>>>>> > > I
>>>>>>>>>>>> > > > was asking for a test case where removing this third step
>>>>>>>>>>>> would make a
>>>>>>>>>>>> > > > difference.
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > I don't understand what you mean by "y0 really close to
>>>>>>>>>>>> y1".  What are you
>>>>>>>>>>>> > > saying?
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > tan(Math.PI*45/2) requires the 3rd iteration.
>>>>>>>>>>>> ieee754_rem_pio2 returns
>>>>>>>>>>>> > > [45, -9.790984586812941e-16, -6.820314736619894e-32]
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > If you ignore the y1 result, we have
>>>>>>>>>>>> > > kernel_tan(-9.790984586812941e-16, 0e0, -1) ->
>>>>>>>>>>>> 1021347742030824.2
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > If you include the y1 result:
>>>>>>>>>>>> > > kernel_tan(-9.790984586812941e-16,-6.820314736619894e-32,
>>>>>>>>>>>> -1) ->
>>>>>>>>>>>> > > 1021347742030824.1
>>>>>>>>>>>> >
>>>>>>>>>>>> > I somehow didn't type what I thought. I meant to say: if y0
>>>>>>>>>>>> is really close
>>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>>> > 0, there does not seem to be any point to invest in the third
>>>>>>>>>>>> loop. (I am
>>>>>>>>>>>> aware
>>>>>>>>>>>> > that omitting y1 changes the result in some cases. I'm not
>>>>>>>>>>>> arguing this).
>>>>>>>>>>>> >
>>>>>>>>>>>> > So in the example here, if I omit the third iteration, I get
>>>>>>>>>>>> > [45, -9.790984586812941e-16, -6.820199415561299e-32]
>>>>>>>>>>>> >
>>>>>>>>>>>> > y0 is the same, y1 differs slightly, but the end result is
>>>>>>>>>>>> still
>>>>>>>>>>>> > 1021347742030824.1.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  While I understand your desire to reduce the complexity, you
>>>>>>>>>>>> are modifying an
>>>>>>>>>>>> algorithm written by an expert.  I think the burden is on you
>>>>>>>>>>>> to prove that by
>>>>>>>>>>>> removing the third iteration you do not change the value of y0.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  Also, where is this coming from?  In reality, how often will
>>>>>>>>>>>> you compute
>>>>>>>>>>>>
>>>>>>>>>>> sin(x)
>>>>>>>>>>>
>>>>>>>>>>>> where x is very near a multiple of pi/2 (where the third
>>>>>>>>>>>> iteration is needed)?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  I suspect it occurs more often than we might expect, but also
>>>>>>>>>>>> that if you're
>>>>>>>>>>>> doing that, I think you're also computing zillions more values
>>>>>>>>>>>> that are not a
>>>>>>>>>>>> multiple of pi/2.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  For example, in 3d-morph, we compute sin((n-1)*pi/15) for n = 0
>>>>>>>>>>>> to 119.  Thus
>>>>>>>>>>>> out of 120 values, we have a multiple of pi just 8 times out of
>>>>>>>>>>>> 120. If the
>>>>>>>>>>>>
>>>>>>>>>>> cost
>>>>>>>>>>>
>>>>>>>>>>>> of reduction for multiples of pi/2 AND the computation of sin
>>>>>>>>>>>> were reduced to
>>>>>>>>>>>> exactly zero, you would save about just 6.6% in runtime.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  I think there are more important things to look at.  We need
>>>>>>>>>>>> profile results.
>>>>>>>>>>>>
>>>>>>>>>>> We
>>>>>>>>>>>
>>>>>>>>>>>> need to understand what is really expensive in the reduction,
>>>>>>>>>>>> not what we
>>>>>>>>>>>>
>>>>>>>>>>> think
>>>>>>>>>>>
>>>>>>>>>>>> is expensive.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I added back the third iteration, and tweaked some places, so
>>>>>>>>>>> that the runtime
>>>>>>>>>>> is now down to 16ms (vs the current 12ms).
>>>>>>>>>>>
>>>>>>>>>>> https://codereview.chromium.org/303753002/
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>   --
>>>>> --
>>>>> v8-dev mailing list
>>>>> [email protected]
>>>>> http://groups.google.com/group/v8-dev
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "v8-dev" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>>> --
>>>> v8-dev mailing list
>>>> [email protected]
>>>> http://groups.google.com/group/v8-dev
>>>> ---
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "v8-dev" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/v8-dev/Y6u4aNdP2Xs/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> [email protected].
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>>> --
>>> v8-dev mailing list
>>> [email protected]
>>> http://groups.google.com/group/v8-dev
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "v8-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> --
>> v8-dev mailing list
>> [email protected]
>> http://groups.google.com/group/v8-dev
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "v8-dev" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/v8-dev/Y6u4aNdP2Xs/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> --
> v8-dev mailing list
> [email protected]
> http://groups.google.com/group/v8-dev
> ---
> You received this message because you are subscribed to the Google Groups
> "v8-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [v8-dev] Re: Trigonometric functions using fdlibm. (issue 303753002)

Reply via email to