Some improvements to numeric sqrt() and ln()

Dean Rasheed Fri, 28 Feb 2020 00:16:23 -0800

Attached is a WIP patch to improve the performance of numeric sqrt()
and ln(), which also makes a couple of related improvements to
div_var_fast(), all of which have knock-on benefits for other numeric
functions. The actual impact varies greatly depending on the inputs,
but the overall effect is to reduce the run time of the numeric_big
regression test by about 20%.


Additionally it improves the accuracy of sqrt() -- currently sqrt()
sometimes rounds the last digit of the result the wrong way, for
example sqrt(100000000000000010000000000000000) returns
10000000000000001, when the correct answer should be 10000000000000000
to zero decimal places. With this patch, sqrt() guarantees to return
the result correctly rounded to the last digit for all inputs.

The main change is to sqrt_var(), which now uses a different algorithm
[1] for better performance than the Newton-Raphson method. Actually
I've re-cast the algorithm from [1] into an iterative form, rather
than doing it recursively, as it's presented in that paper. This
improves performance further, by avoiding overheads from function
calls and copying numeric variables around. Also, IMO, the iterative
form of the algorithm is much more elegant, since it works by making a
single pass over the input digits, consuming them one at a time from
most significant to least, producing a succession of increasingly more
accurate approximations to the square root, until the desired
precision is reached.

For inputs with a handful of digits, this is typically 3-5 times
faster, and for inputs with more digits the performance improvement is
larger (e.g. sqrt(2e131071) is around 10 times faster). If the input
is a perfect square, with a result having a lot of trailing zeros, the
new algorithm is much faster because it basically has nothing to do in
later iterations (e.g., sqrt(64e13070) is about 600 times faster).

Another change to sqrt_var() is that it now explicitly supports a
negative rscale, i.e., rounding before the decimal point. This is
exploited by ln_var() in its argument reduction stage -- ln_var()
reduces all inputs to the range (0.9, 1.1) by repeatedly taking the
square root. For very large inputs this can have an enormous impact,
for example log(1e131071) currently takes about 6.5 seconds on my
machine, whereas with this patch I can run it 1000 times in a plpgsql
loop in about 90ms, so its around 70,000 times faster in that case. Of
course, that's an extreme example, and for most inputs it's a much
more modest difference (e.g., ln(2) is about 1.5 times faster).

In passing, I also made a couple of optimisations to div_var_fast(),
discovered while comparing it's performace with div_var() for various
inputs.

It's possible that there are further gains to be had in the sqrt()
algorithm on platforms that support 128-bit integers, but I haven't
had a chance to investigate that yet.

Regards,
Dean

[1] https://hal.inria.fr/inria-00072854/document

numeric-sqrt-ln.patch
Description: Binary data

Some improvements to numeric sqrt() and ln()

Reply via email to