Status update: I haven't had much time recently, and there are still somethings I want to try before cleaning up the code to get it in shape for landing. I also discovered that it would tank Kraken's audio-oscillator, but the reason for that is not the sin/cos performance, but for some reason the compiler is generating sub-optimal code for the main hot function in that benchmark. It seems timing-dependent. I need to find out what's happening there before progressing here.
Yang On Thu, Jun 12, 2014 at 6:28 PM, Raymond Toy <[email protected]> wrote: > > > > On Thu, Jun 12, 2014 at 12:37 AM, Yang Guo <[email protected]> wrote: > >> >> >> On Wed, Jun 11, 2014 at 12:41 AM, Raymond Toy <[email protected]> wrote: >> >>> But what is the function call for? >>> >> >> The function call is the slow case for when double to integer (truncated) >> conversion using cvttsd2si overflows. >> > > Ah, ok. If we changed the if statement at line 222 to say abs(x) < > 1647099.9999, would the compiler be able to derive that n cannot > overflow? That be another opportunity to micro-optimize this routine. > Actually some of the if's could be replaced with float comparisons instead > of integer comparisons, if that makes a difference. > > > >> >>> And for the record, I'm attaching the new profile results. >>> >>> >>> >>> >>> On Mon, Jun 9, 2014 at 11:57 PM, Yang Guo <[email protected]> wrote: >>> >>>> Yep. That piece of code does a double to integer conversion. >>>> >>>> >>>> On Tue, Jun 10, 2014 at 12:08 AM, <[email protected]> wrote: >>>> >>>>> (Yang sent me the new profile result in private email) >>>>> >>>>> This looks much better. The only question I have now is what the code >>>>> in 0xed to 0x105 is doing. Something related to converting to a float to >>>>> an >>>>> integer; perhaps boxing the result? >>>>> >>>>> Otherwise, it looks roughly like what gcc does, with a few extra moves >>>>> and the bounds checks for kTrig. >>>>> >>>>> >>>>> On Friday, June 6, 2014 9:28:53 AM UTC-7, Yang Guo wrote: >>>>> >>>>>> Argh. I even prepared it, but totally forgot to send it to you. Will >>>>>> do when I get home. >>>>>> >>>>>> Yang >>>>>> On Jun 6, 2014 6:03 PM, "Raymond Toy" <[email protected]> wrote: >>>>>> >>>>>>> Thanks for clarifying these results and for providing the modified >>>>>>> 3d-morph. >>>>>>> >>>>>>> When you get a chance could you provide new profile results with >>>>>>> MathRound removed? And can you provide the pref results with the event >>>>>>> counters enabled so we can see cache effects? >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 6, 2014 at 6:56 AM, Yang Guo <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Raymond, >>>>>>>> >>>>>>>> the modified 3d-morph is attached. >>>>>>>> >>>>>>>> The code from 0xa to 0x47 are a stack check (at the entry to >>>>>>>> function to detect stack overflow) and unboxing the argument into a >>>>>>>> double >>>>>>>> register (double numbers are usually boxed in V8 and stored on the >>>>>>>> heap, >>>>>>>> except for certain kinds of arrays and in optimized code). >>>>>>>> >>>>>>>> The code from 0xd5 to 0x147 is indeed a MathRound. Replacing it >>>>>>>> with a floor (updated CL) actually gives a slight boost. The modified >>>>>>>> 3d-morph goes from 8250ms to 8050ms, and the unmodified one now >>>>>>>> alternates >>>>>>>> between 15ms and 16ms. >>>>>>>> >>>>>>>> Yes, those comparisons are bounds checks. Unfortunately, >>>>>>>> out-of-bound reads on typed arrays in Javascript should return >>>>>>>> undefined. >>>>>>>> We already eliminate some of the redundant bounds checks, but not all >>>>>>>> can >>>>>>>> be eliminated. Of course the generated code for Javascript is a lot >>>>>>>> larger >>>>>>>> than that for C, no surprise there. Javascript is a dynamic language >>>>>>>> after >>>>>>>> all. And are right in that we probably should focus on the things that >>>>>>>> add >>>>>>>> overhead. >>>>>>>> >>>>>>>> Moving the calculation to C wouldn't make things faster though, >>>>>>>> since the switch to C code is rather expensive, and C code cannot be >>>>>>>> inlined either. >>>>>>>> >>>>>>>> Yang >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jun 6, 2014 at 12:52 AM, Raymond Toy <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Can you explain what some of the code is in the prof results you >>>>>>>>> sent? >>>>>>>>> >>>>>>>>> What is all the stuff from address 0xa to 0x47 doing? >>>>>>>>> >>>>>>>>> What is 0xd5 to 0x147 doing? I'm guessing it's doing MathRound, >>>>>>>>> but it seems that can be done with just one or two instructions. And >>>>>>>>> the >>>>>>>>> original code was Math.floor(x + 0.5). If MathRound is rounding to >>>>>>>>> even, >>>>>>>>> then that is not what we want. >>>>>>>>> >>>>>>>>> There are some various bits of code comparing ebx to small >>>>>>>>> positive constants Is that a bounds check on the kTrig array? >>>>>>>>> >>>>>>>>> When I compare this disassembly with what gcc produces on the >>>>>>>>> original fdlibm code, gcc seems to be much smaller and simpler. The >>>>>>>>> actual >>>>>>>>> computation parts, however, appear roughly equal. It's all the stuff >>>>>>>>> around it that makes V8 probably run slower than I would have >>>>>>>>> expected. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jun 5, 2014 at 8:28 AM, Yang Guo <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Here's a profile of the 64bit build. MathSinSlow takes most of >>>>>>>>>> the time, and the file includes a disassembly of the generated code, >>>>>>>>>> with >>>>>>>>>> each instruction annotated with profiling stats. Note that this runs >>>>>>>>>> an >>>>>>>>>> altered version of SunSpider's 3d-morph to run longer, giving more >>>>>>>>>> profiling samples. >>>>>>>>>> >>>>>>>>>> Yang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jun 5, 2014 at 5:23 PM, <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> On 2014/06/04 16:30:37, Raymond Toy wrote: >>>>>>>>>>> >>>>>>>>>>>> On 2014/06/04 07:19:29, Yang wrote: >>>>>>>>>>>> > On 2014/06/03 16:51:30, Raymond Toy wrote: >>>>>>>>>>>> > > On 2014/06/03 07:01:45, Yang wrote: >>>>>>>>>>>> > > > https://codereview.chromium.org/303753002/diff/40001/src/ >>>>>>>>>>>> math.js >>>>>>>>>>>> > > > File src/math.js (right): >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> https://codereview.chromium.org/303753002/diff/40001/src/mat >>>>>>>>>>>> h.js#newcode262 >>>>>>>>>>>> > > > src/math.js:262: } >>>>>>>>>>>> > > > On 2014/06/02 17:26:11, Raymond Toy wrote: >>>>>>>>>>>> > > > > As you mentioned via email, you've removed the 3rd >>>>>>>>>>>> iteration. This is >>>>>>>>>>>> > really >>>>>>>>>>>> > > > > needed if you want to be able to reduce multiples of >>>>>>>>>>>> pi/2 accurately. >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > That's true. However, the reduction step is not exposed >>>>>>>>>>>> as a library >>>>>>>>>>>> > function. >>>>>>>>>>>> > > > From what I have seen, the third step seems to only >>>>>>>>>>>> affect y1. With a y0 >>>>>>>>>>>> > > really >>>>>>>>>>>> > > > close to y1, it does not change the result of sine or >>>>>>>>>>>> cosine. This is >>>>>>>>>>>> >>>>>>>>>>> also >>>>>>>>>>> >>>>>>>>>>>> > why >>>>>>>>>>>> > > I >>>>>>>>>>>> > > > was asking for a test case where removing this third step >>>>>>>>>>>> would make a >>>>>>>>>>>> > > > difference. >>>>>>>>>>>> > > >>>>>>>>>>>> > > I don't understand what you mean by "y0 really close to >>>>>>>>>>>> y1". What are you >>>>>>>>>>>> > > saying? >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > tan(Math.PI*45/2) requires the 3rd iteration. >>>>>>>>>>>> ieee754_rem_pio2 returns >>>>>>>>>>>> > > [45, -9.790984586812941e-16, -6.820314736619894e-32] >>>>>>>>>>>> > > >>>>>>>>>>>> > > If you ignore the y1 result, we have >>>>>>>>>>>> > > kernel_tan(-9.790984586812941e-16, 0e0, -1) -> >>>>>>>>>>>> 1021347742030824.2 >>>>>>>>>>>> > > >>>>>>>>>>>> > > If you include the y1 result: >>>>>>>>>>>> > > kernel_tan(-9.790984586812941e-16,-6.820314736619894e-32, >>>>>>>>>>>> -1) -> >>>>>>>>>>>> > > 1021347742030824.1 >>>>>>>>>>>> > >>>>>>>>>>>> > I somehow didn't type what I thought. I meant to say: if y0 >>>>>>>>>>>> is really close >>>>>>>>>>>> >>>>>>>>>>> to >>>>>>>>>>> >>>>>>>>>>>> > 0, there does not seem to be any point to invest in the third >>>>>>>>>>>> loop. (I am >>>>>>>>>>>> aware >>>>>>>>>>>> > that omitting y1 changes the result in some cases. I'm not >>>>>>>>>>>> arguing this). >>>>>>>>>>>> > >>>>>>>>>>>> > So in the example here, if I omit the third iteration, I get >>>>>>>>>>>> > [45, -9.790984586812941e-16, -6.820199415561299e-32] >>>>>>>>>>>> > >>>>>>>>>>>> > y0 is the same, y1 differs slightly, but the end result is >>>>>>>>>>>> still >>>>>>>>>>>> > 1021347742030824.1. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> While I understand your desire to reduce the complexity, you >>>>>>>>>>>> are modifying an >>>>>>>>>>>> algorithm written by an expert. I think the burden is on you >>>>>>>>>>>> to prove that by >>>>>>>>>>>> removing the third iteration you do not change the value of y0. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Also, where is this coming from? In reality, how often will >>>>>>>>>>>> you compute >>>>>>>>>>>> >>>>>>>>>>> sin(x) >>>>>>>>>>> >>>>>>>>>>>> where x is very near a multiple of pi/2 (where the third >>>>>>>>>>>> iteration is needed)? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I suspect it occurs more often than we might expect, but also >>>>>>>>>>>> that if you're >>>>>>>>>>>> doing that, I think you're also computing zillions more values >>>>>>>>>>>> that are not a >>>>>>>>>>>> multiple of pi/2. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> For example, in 3d-morph, we compute sin((n-1)*pi/15) for n = 0 >>>>>>>>>>>> to 119. Thus >>>>>>>>>>>> out of 120 values, we have a multiple of pi just 8 times out of >>>>>>>>>>>> 120. If the >>>>>>>>>>>> >>>>>>>>>>> cost >>>>>>>>>>> >>>>>>>>>>>> of reduction for multiples of pi/2 AND the computation of sin >>>>>>>>>>>> were reduced to >>>>>>>>>>>> exactly zero, you would save about just 6.6% in runtime. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think there are more important things to look at. We need >>>>>>>>>>>> profile results. >>>>>>>>>>>> >>>>>>>>>>> We >>>>>>>>>>> >>>>>>>>>>>> need to understand what is really expensive in the reduction, >>>>>>>>>>>> not what we >>>>>>>>>>>> >>>>>>>>>>> think >>>>>>>>>>> >>>>>>>>>>>> is expensive. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I added back the third iteration, and tweaked some places, so >>>>>>>>>>> that the runtime >>>>>>>>>>> is now down to 16ms (vs the current 12ms). >>>>>>>>>>> >>>>>>>>>>> https://codereview.chromium.org/303753002/ >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> -- >>>>> -- >>>>> v8-dev mailing list >>>>> [email protected] >>>>> http://groups.google.com/group/v8-dev >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "v8-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>>> -- >>>> v8-dev mailing list >>>> [email protected] >>>> http://groups.google.com/group/v8-dev >>>> --- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "v8-dev" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/v8-dev/Y6u4aNdP2Xs/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> [email protected]. >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> -- >>> v8-dev mailing list >>> [email protected] >>> http://groups.google.com/group/v8-dev >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "v8-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> -- >> v8-dev mailing list >> [email protected] >> http://groups.google.com/group/v8-dev >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "v8-dev" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/v8-dev/Y6u4aNdP2Xs/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > -- > v8-dev mailing list > [email protected] > http://groups.google.com/group/v8-dev > --- > You received this message because you are subscribed to the Google Groups > "v8-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
