On Thu, Jun 12, 2014 at 12:37 AM, Yang Guo <[email protected]> wrote:
> > > On Wed, Jun 11, 2014 at 12:41 AM, Raymond Toy <[email protected]> wrote: > >> But what is the function call for? >> > > The function call is the slow case for when double to integer (truncated) > conversion using cvttsd2si overflows. > Ah, ok. If we changed the if statement at line 222 to say abs(x) < 1647099.9999, would the compiler be able to derive that n cannot overflow? That be another opportunity to micro-optimize this routine. Actually some of the if's could be replaced with float comparisons instead of integer comparisons, if that makes a difference. > >> And for the record, I'm attaching the new profile results. >> >> >> >> >> On Mon, Jun 9, 2014 at 11:57 PM, Yang Guo <[email protected]> wrote: >> >>> Yep. That piece of code does a double to integer conversion. >>> >>> >>> On Tue, Jun 10, 2014 at 12:08 AM, <[email protected]> wrote: >>> >>>> (Yang sent me the new profile result in private email) >>>> >>>> This looks much better. The only question I have now is what the code >>>> in 0xed to 0x105 is doing. Something related to converting to a float to an >>>> integer; perhaps boxing the result? >>>> >>>> Otherwise, it looks roughly like what gcc does, with a few extra moves >>>> and the bounds checks for kTrig. >>>> >>>> >>>> On Friday, June 6, 2014 9:28:53 AM UTC-7, Yang Guo wrote: >>>> >>>>> Argh. I even prepared it, but totally forgot to send it to you. Will >>>>> do when I get home. >>>>> >>>>> Yang >>>>> On Jun 6, 2014 6:03 PM, "Raymond Toy" <[email protected]> wrote: >>>>> >>>>>> Thanks for clarifying these results and for providing the modified >>>>>> 3d-morph. >>>>>> >>>>>> When you get a chance could you provide new profile results with >>>>>> MathRound removed? And can you provide the pref results with the event >>>>>> counters enabled so we can see cache effects? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> >>>>>> On Fri, Jun 6, 2014 at 6:56 AM, Yang Guo <[email protected]> wrote: >>>>>> >>>>>>> Hi Raymond, >>>>>>> >>>>>>> the modified 3d-morph is attached. >>>>>>> >>>>>>> The code from 0xa to 0x47 are a stack check (at the entry to >>>>>>> function to detect stack overflow) and unboxing the argument into a >>>>>>> double >>>>>>> register (double numbers are usually boxed in V8 and stored on the heap, >>>>>>> except for certain kinds of arrays and in optimized code). >>>>>>> >>>>>>> The code from 0xd5 to 0x147 is indeed a MathRound. Replacing it with >>>>>>> a floor (updated CL) actually gives a slight boost. The modified >>>>>>> 3d-morph >>>>>>> goes from 8250ms to 8050ms, and the unmodified one now alternates >>>>>>> between >>>>>>> 15ms and 16ms. >>>>>>> >>>>>>> Yes, those comparisons are bounds checks. Unfortunately, >>>>>>> out-of-bound reads on typed arrays in Javascript should return >>>>>>> undefined. >>>>>>> We already eliminate some of the redundant bounds checks, but not all >>>>>>> can >>>>>>> be eliminated. Of course the generated code for Javascript is a lot >>>>>>> larger >>>>>>> than that for C, no surprise there. Javascript is a dynamic language >>>>>>> after >>>>>>> all. And are right in that we probably should focus on the things that >>>>>>> add >>>>>>> overhead. >>>>>>> >>>>>>> Moving the calculation to C wouldn't make things faster though, >>>>>>> since the switch to C code is rather expensive, and C code cannot be >>>>>>> inlined either. >>>>>>> >>>>>>> Yang >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 6, 2014 at 12:52 AM, Raymond Toy <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Can you explain what some of the code is in the prof results you >>>>>>>> sent? >>>>>>>> >>>>>>>> What is all the stuff from address 0xa to 0x47 doing? >>>>>>>> >>>>>>>> What is 0xd5 to 0x147 doing? I'm guessing it's doing MathRound, but >>>>>>>> it seems that can be done with just one or two instructions. And the >>>>>>>> original code was Math.floor(x + 0.5). If MathRound is rounding to >>>>>>>> even, >>>>>>>> then that is not what we want. >>>>>>>> >>>>>>>> There are some various bits of code comparing ebx to small positive >>>>>>>> constants Is that a bounds check on the kTrig array? >>>>>>>> >>>>>>>> When I compare this disassembly with what gcc produces on the >>>>>>>> original fdlibm code, gcc seems to be much smaller and simpler. The >>>>>>>> actual >>>>>>>> computation parts, however, appear roughly equal. It's all the stuff >>>>>>>> around it that makes V8 probably run slower than I would have expected. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jun 5, 2014 at 8:28 AM, Yang Guo <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Here's a profile of the 64bit build. MathSinSlow takes most of the >>>>>>>>> time, and the file includes a disassembly of the generated code, with >>>>>>>>> each >>>>>>>>> instruction annotated with profiling stats. Note that this runs an >>>>>>>>> altered >>>>>>>>> version of SunSpider's 3d-morph to run longer, giving more profiling >>>>>>>>> samples. >>>>>>>>> >>>>>>>>> Yang >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jun 5, 2014 at 5:23 PM, <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> On 2014/06/04 16:30:37, Raymond Toy wrote: >>>>>>>>>> >>>>>>>>>>> On 2014/06/04 07:19:29, Yang wrote: >>>>>>>>>>> > On 2014/06/03 16:51:30, Raymond Toy wrote: >>>>>>>>>>> > > On 2014/06/03 07:01:45, Yang wrote: >>>>>>>>>>> > > > https://codereview.chromium.org/303753002/diff/40001/src/ >>>>>>>>>>> math.js >>>>>>>>>>> > > > File src/math.js (right): >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> https://codereview.chromium.org/303753002/diff/40001/src/mat >>>>>>>>>>> h.js#newcode262 >>>>>>>>>>> > > > src/math.js:262: } >>>>>>>>>>> > > > On 2014/06/02 17:26:11, Raymond Toy wrote: >>>>>>>>>>> > > > > As you mentioned via email, you've removed the 3rd >>>>>>>>>>> iteration. This is >>>>>>>>>>> > really >>>>>>>>>>> > > > > needed if you want to be able to reduce multiples of >>>>>>>>>>> pi/2 accurately. >>>>>>>>>>> > > > >>>>>>>>>>> > > > That's true. However, the reduction step is not exposed as >>>>>>>>>>> a library >>>>>>>>>>> > function. >>>>>>>>>>> > > > From what I have seen, the third step seems to only affect >>>>>>>>>>> y1. With a y0 >>>>>>>>>>> > > really >>>>>>>>>>> > > > close to y1, it does not change the result of sine or >>>>>>>>>>> cosine. This is >>>>>>>>>>> >>>>>>>>>> also >>>>>>>>>> >>>>>>>>>>> > why >>>>>>>>>>> > > I >>>>>>>>>>> > > > was asking for a test case where removing this third step >>>>>>>>>>> would make a >>>>>>>>>>> > > > difference. >>>>>>>>>>> > > >>>>>>>>>>> > > I don't understand what you mean by "y0 really close to y1". >>>>>>>>>>> What are you >>>>>>>>>>> > > saying? >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > tan(Math.PI*45/2) requires the 3rd iteration. >>>>>>>>>>> ieee754_rem_pio2 returns >>>>>>>>>>> > > [45, -9.790984586812941e-16, -6.820314736619894e-32] >>>>>>>>>>> > > >>>>>>>>>>> > > If you ignore the y1 result, we have >>>>>>>>>>> > > kernel_tan(-9.790984586812941e-16, 0e0, -1) -> >>>>>>>>>>> 1021347742030824.2 >>>>>>>>>>> > > >>>>>>>>>>> > > If you include the y1 result: >>>>>>>>>>> > > kernel_tan(-9.790984586812941e-16,-6.820314736619894e-32, >>>>>>>>>>> -1) -> >>>>>>>>>>> > > 1021347742030824.1 >>>>>>>>>>> > >>>>>>>>>>> > I somehow didn't type what I thought. I meant to say: if y0 is >>>>>>>>>>> really close >>>>>>>>>>> >>>>>>>>>> to >>>>>>>>>> >>>>>>>>>>> > 0, there does not seem to be any point to invest in the third >>>>>>>>>>> loop. (I am >>>>>>>>>>> aware >>>>>>>>>>> > that omitting y1 changes the result in some cases. I'm not >>>>>>>>>>> arguing this). >>>>>>>>>>> > >>>>>>>>>>> > So in the example here, if I omit the third iteration, I get >>>>>>>>>>> > [45, -9.790984586812941e-16, -6.820199415561299e-32] >>>>>>>>>>> > >>>>>>>>>>> > y0 is the same, y1 differs slightly, but the end result is >>>>>>>>>>> still >>>>>>>>>>> > 1021347742030824.1. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> While I understand your desire to reduce the complexity, you are >>>>>>>>>>> modifying an >>>>>>>>>>> algorithm written by an expert. I think the burden is on you to >>>>>>>>>>> prove that by >>>>>>>>>>> removing the third iteration you do not change the value of y0. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Also, where is this coming from? In reality, how often will you >>>>>>>>>>> compute >>>>>>>>>>> >>>>>>>>>> sin(x) >>>>>>>>>> >>>>>>>>>>> where x is very near a multiple of pi/2 (where the third >>>>>>>>>>> iteration is needed)? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I suspect it occurs more often than we might expect, but also >>>>>>>>>>> that if you're >>>>>>>>>>> doing that, I think you're also computing zillions more values >>>>>>>>>>> that are not a >>>>>>>>>>> multiple of pi/2. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> For example, in 3d-morph, we compute sin((n-1)*pi/15) for n = 0 >>>>>>>>>>> to 119. Thus >>>>>>>>>>> out of 120 values, we have a multiple of pi just 8 times out of >>>>>>>>>>> 120. If the >>>>>>>>>>> >>>>>>>>>> cost >>>>>>>>>> >>>>>>>>>>> of reduction for multiples of pi/2 AND the computation of sin >>>>>>>>>>> were reduced to >>>>>>>>>>> exactly zero, you would save about just 6.6% in runtime. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think there are more important things to look at. We need >>>>>>>>>>> profile results. >>>>>>>>>>> >>>>>>>>>> We >>>>>>>>>> >>>>>>>>>>> need to understand what is really expensive in the reduction, >>>>>>>>>>> not what we >>>>>>>>>>> >>>>>>>>>> think >>>>>>>>>> >>>>>>>>>>> is expensive. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I added back the third iteration, and tweaked some places, so >>>>>>>>>> that the runtime >>>>>>>>>> is now down to 16ms (vs the current 12ms). >>>>>>>>>> >>>>>>>>>> https://codereview.chromium.org/303753002/ >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>> -- >>>> v8-dev mailing list >>>> [email protected] >>>> http://groups.google.com/group/v8-dev >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "v8-dev" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> -- >>> v8-dev mailing list >>> [email protected] >>> http://groups.google.com/group/v8-dev >>> --- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "v8-dev" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/v8-dev/Y6u4aNdP2Xs/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> -- >> v8-dev mailing list >> [email protected] >> http://groups.google.com/group/v8-dev >> --- >> You received this message because you are subscribed to the Google Groups >> "v8-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > -- > v8-dev mailing list > [email protected] > http://groups.google.com/group/v8-dev > --- > You received this message because you are subscribed to a topic in the > Google Groups "v8-dev" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/v8-dev/Y6u4aNdP2Xs/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
