On Tue, Feb 02, 2010 at 01:37:46PM +0100, Petr Savicky wrote: > I would like to add some more information concerning the patch C > to the function choose() proposed in the email > https://stat.ethz.ch/pipermail/r-devel/2009-December/056177.html > > The patch uses transformations of choose(n, k), which are described in > http://www.cs.cas.cz/~savicky/R-devel/formulas_choose.pdf > > The accuracy of the modified function choose(n, k) may be verified > on randomly generated examples using Rmpfr package > http://cran.at.r-project.org/web/packages/Rmpfr/index.html > and the script > http://www.cs.cas.cz/~savicky/R-devel/test_choose_2.R
Let me add an explanation of the script test_choose_2.R. The script generates a vector of random real numbers n, which are from a continuous distribution, but concentrate near integer values. The original implementation of choose(m, k) considers m as an integer, if abs(m - round(m)) <= 1e-7. The vector n is generated so that the probability of abs(n[i] - round(n[i])) <= 1e-7 is approximately 0.1. The distribution of n[i] - round(n[i]) is symmetric around 0, so we get both n[i], which are close to an integer from below and from above. On the other hand, the probability of abs(n[i] - round(n[i])) >= 0.3 is approximately 0.1083404, so there are also numbers n[i], which are not close to an integer. The script calculates choose(n, k) for k in 0:209 (an ad hoc upper bound) 1. using the modified choose(n, k) from patch C 2. using the expression n(n-1)...(n-k+1)/(1 2 ... k) with Rmpfr numbers of precision 100 bits. The relative difference of these two results is computed and its maximum over all n[i] and k from 0 to a given bound is reported. The bounds on k are chosen to be the numbers, whose last digit is 9, since the algorithm in choose(n,k) is different for k <= 29 and k >= 30. An upper bound of the relative rounding error of a single operation with Rmpfr numbers of precision 100 bits is (1 + 2^-100). Hence, an upper bound on the total relative error of n(n-1)...(n-k+1)/(1 2 ... k) is (1 + 2^-100)^(2*209) \approx (1 + 2 * 209 * 2^-100) \approx 1 + 3.297e-28. This is a negligible error compared to the errors reported by test_choose_2.R, so the reported errors are the errors of the patched choose(n, k). The errors reported by test_choose_2.R with patched choose(n,k) are in a previous email. Running test_choose_2.R with unpatched R version 2.11.0 (2010-02-01 r51089) produces larger errors. > source("test_choose_2.R") k <= 9 max rel err = Inf k <= 19 max rel err = Inf > source("test_choose_2.R") k <= 9 max rel err = 0.1111111 k <= 19 max rel err = Inf > source("test_choose_2.R") k <= 9 max rel err = Inf k <= 19 max rel err = Inf > source("test_choose_2.R") k <= 9 max rel err = 8.383718e-08 k <= 19 max rel err = 1.226306e-07 k <= 29 max rel err = 1.469050e-07 Error: segfault from C stack overflow The Inf relative errors occur in cases, where choose(n, k) calculates 0, but the correct result is not 0. The stack overflow error is sometimes generated due to an infinite sequence of transformations choose(n, k) -> choose(n, n-k) -> choose(n, round(n-k)) which occur if k = 30 and n = 60 - eps. The reason for the transformation choose(n, k) -> choose(n, n-k) is that k >= k_small_max = 30 n is treated as an integer in R_IS_INT(n) n-k < k_small_max So, choose(n, n-k) is called. There, we determine that n-k is almost an integer and since n-k is the second argument of choose(n,k), it is explicitly rounded to an integer. Since n = 2*k - eps, we have round(n-k) = round(k - eps) = k. The result is that we again call choose(n, k) and this repeats until the stack overflow. For example > choose(60 - 1e-9, 30) Error: segfault from C stack overflow Besides patch C, which corrects this stack overflow, also the previous patches A, B from https://stat.ethz.ch/pipermail/r-devel/2009-December/056154.html correct this, but have lower accuracy. Petr Savicky. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel