On Fri, Sep 10, 2010 at 9:51 AM, Soeren Sandmann <[email protected]> wrote: > ... > I'd guess on most systems, arithmetic on floating point values will be > much faster than arithmetic on int128s, simply because an int128 would > have to be built up from two 64 bit, or four 32 bit registers.
I tried to do some benchmarking on a 64-bit cpu and I have pushed four branches with different mixes of fixed and floating point math: - wip/radial-fixsqrt does all the computations in fixed point. It is slower because of iterative sqrt and integer division. I think it can be interesting as a reference, but its performance is not acceptable. - wip/radial does all the computations before the sqrt in fixed point (thus requires some 128-bit variables, but just additions, which seems to be fast on my core i7). Most of the time is spent (wasted?) in the sqrt and in the int128->double conversion. The conversion might maybe be made faster (currently it is a call to a fallback function in the system library), but doing just the final computations in double precision seems to be fast enough and should not lose much precision in the interesting cases. - wip/radial-float2 computes the discriminant in double procision (instead of 128 bit fixed point) and should be both fast (almost two times faster than wip/radial) and accurate. Does it look good? I'm a little worried about the int64->double conversions in this branch. Are there architectures where it might be a problem? - in wip/radial-float I tried to keep all the variables of the inner loop in floating point to see how much this affects the performance. I get small speed improvements, but I don't think they justify this branch (the accuracy guarantees in this case are very loose since errors accumulate quadratically with the number of iterations). All the branches could be affected by problems when computing the solutions of the 2nd degree equation, since I used the "school formula", which is not numerically stable. I will try to find out if there are interesting cases in which this can be a problem. If somebody can confirm that wip/radial-float2 should work fine on most 32-bit architectures, I'll clean it up some more so that it will be ready to be merged. Again, thank you for your suggestions (which were in fact very insightful) Andrea _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
