Attached is a WIP patch to improve the performance of numeric sqrt() and ln(), which also makes a couple of related improvements to div_var_fast(), all of which have knock-on benefits for other numeric functions. The actual impact varies greatly depending on the inputs, but the overall effect is to reduce the run time of the numeric_big regression test by about 20%.
Additionally it improves the accuracy of sqrt() -- currently sqrt() sometimes rounds the last digit of the result the wrong way, for example sqrt(100000000000000010000000000000000) returns 10000000000000001, when the correct answer should be 10000000000000000 to zero decimal places. With this patch, sqrt() guarantees to return the result correctly rounded to the last digit for all inputs. The main change is to sqrt_var(), which now uses a different algorithm [1] for better performance than the Newton-Raphson method. Actually I've re-cast the algorithm from [1] into an iterative form, rather than doing it recursively, as it's presented in that paper. This improves performance further, by avoiding overheads from function calls and copying numeric variables around. Also, IMO, the iterative form of the algorithm is much more elegant, since it works by making a single pass over the input digits, consuming them one at a time from most significant to least, producing a succession of increasingly more accurate approximations to the square root, until the desired precision is reached. For inputs with a handful of digits, this is typically 3-5 times faster, and for inputs with more digits the performance improvement is larger (e.g. sqrt(2e131071) is around 10 times faster). If the input is a perfect square, with a result having a lot of trailing zeros, the new algorithm is much faster because it basically has nothing to do in later iterations (e.g., sqrt(64e13070) is about 600 times faster). Another change to sqrt_var() is that it now explicitly supports a negative rscale, i.e., rounding before the decimal point. This is exploited by ln_var() in its argument reduction stage -- ln_var() reduces all inputs to the range (0.9, 1.1) by repeatedly taking the square root. For very large inputs this can have an enormous impact, for example log(1e131071) currently takes about 6.5 seconds on my machine, whereas with this patch I can run it 1000 times in a plpgsql loop in about 90ms, so its around 70,000 times faster in that case. Of course, that's an extreme example, and for most inputs it's a much more modest difference (e.g., ln(2) is about 1.5 times faster). In passing, I also made a couple of optimisations to div_var_fast(), discovered while comparing it's performace with div_var() for various inputs. It's possible that there are further gains to be had in the sqrt() algorithm on platforms that support 128-bit integers, but I haven't had a chance to investigate that yet. Regards, Dean [1] https://hal.inria.fr/inria-00072854/document
numeric-sqrt-ln.patch
Description: Binary data