[ 
https://issues.apache.org/jira/browse/NUMBERS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479453#comment-17479453
 ] 

Alex Herbert commented on NUMBERS-183:
--------------------------------------

h2. Binomial Coefficient Double

I have written a test comparing the computation of the binomial coefficient 
double value using the simple loop against using the precomputed factorials:

 
{code:java}
// Loop

double result = 1;
for (int i = 1; i <= k; i++) {
    result *= n - k + i;
    result /= i;
}
return Math.floor(result + 0.5f);


// Factorials

// Small factorials are tabulated exactly
// n! / m! / (n-m)!
double result = Factorial.uncheckedFactorial(n) /
                Factorial.uncheckedFactorial(m) /
                Factorial.uncheckedFactorial(n - m);
return Math.floor(result + 0.5f);  {code}
Assuming that any n < 67 is handled using the exact long computation in 
BinomialCoefficient I have tested all coefficients for 67 <= n <= 170. This is 
6136 coefficients ignoring symmetric values. The maximum and root mean squared 
error are:

 
||Method||max||rms||
|Loop|16|4.524|
|Factorials|3|0.4180|

This indicates that the use of the precomputed factorials does not reduce 
errors and it will have a significant speed increase when min(k, n-k) is large.

 
h2. Log Binomial Coefficient

Following the logic in the description, the log of the binomial coefficient is:

 
{noformat}
   n!
---------    = 1 / (k * beta(k, n-k+1))
k! (n-k)!     

log(C(n, k)) = -log(k) - log(beta(k, n-k+1)){noformat}
I implemented the LogBinomialCoefficient class using LogBeta. It returns 
results within 1 ulp of the result computed to 64 digits using maxima with 
either:

 

 
{noformat}
bfloat(log(binomial(n, k)))
bfloat(log(gamma(n+1)) - log(gamma(k+1)) - log(gamma(n-k+1))){noformat}
At modest n and k the maxima binomial function is exact. The gamma function is 
required when n and k are very large. E.g. The binomial function (or beta 
function) in maxima will run for several minutes without returning for 
binomial(2147483647, 1073741824). The gamma result is 1.488522224e9 and this is 
matched by using LogBeta.

Note the gamma function for 2^31 is 1.12845925e+19107526488. With an exponent 
this large the computation may be impractical using a simple loop with 
arbitrary precision which may be what maxima is computing.

 

 

 

 

> Improve binomial coefficient classes
> ------------------------------------
>
>                 Key: NUMBERS-183
>                 URL: https://issues.apache.org/jira/browse/NUMBERS-183
>             Project: Commons Numbers
>          Issue Type: Improvement
>          Components: combinatorics
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Minor
>             Fix For: 1.1
>
>
> The binomial coefficient classes can be improved by several small changes.
> h2. Avoid recursion
> All classes use a recursive method call to compute either C(n, k) or C(n, n - 
> k) depending which is smaller. The methods can avoid recursion using m = 
> min(k, n - k) and computing C(n, m).
> h2. Threshold for overflow
> The LogBinomialCoeffiecient knows that C(1030, 515) overflows a double. But 
> the BinomialCoefficientDouble does not. It will compute C(100000, 50000) with 
> a loop of 50000 computations and return infinity. The class should be updated 
> with some better thresholds:
> {noformat}
> C(1030, 515)           ~ 2.85e308   =>  m >= 515  is overflow
> C(2147483647, 37) * 37 ~ 5.13e303   =>  m < 38    cannot overflow
> {noformat}
> h2. Overflow protection
> Better overflow protection is required for large results. These all currently 
> fail as they overflow to infinity in the intermediate computation:
> {code:java}
>     @ParameterizedTest
>     @CsvSource({
>         "1040, 450, 2.3101613255412135615e307",
>         "1029, 514, 1.4298206864989040819e308",
>         "1786388282, 38, 7.187239013254065384599502085053593e306",
>         "1914878305, 38, 100.6570419073661447979173868523364e306",
>         "1179067476, 39, 30.22890249420109200962786203300876e306",
>         "2147483647, 37, 1.388890512412231479281222156415993e302",
>     })
>     void testBinomialCoefficientLargeK(int n, int k, double nCk) {
>         final double x = BinomialCoefficientDouble.value(n, k);
>         Assertions.assertEquals(nCk, x, Math.ulp(nCk) * 10,
>             () -> "ULP error: " + (x - nCk) / Math.ulp(nCk));
>     }
> {code}
> A test for overflow in the loop can rearrange the terms to compute the 
> correct result.
> h2. Use the known factorials
> The BinomialCoefficientDouble class can exploit the known factorials stored 
> in the Factorial class and avoid a loop computation for small n:
> {code:java}
> if (n <= 170) {
>     // Use fast table lookup:
>     result = Factorial.doubleValue(n);
>     // Smaller m will have a more accurate factorial and may be exact
>     result /= Factorial.doubleValue(m);
>     result /= Factorial.doubleValue(n - m);
> }{code}
> The result appears to be as accurate as the current implementation on a 
> partial test dataset. The complete dataset of factorials will require 
> approximately (sum i=1 ... 170/2) * 2 == 7310 values so could be tested to 
> see if this fast method is robust.
> h2. Avoid cancellation in summations
> The LogBinomialCoefficient sum will suffer from cancellation:
> {code:java}
>         // Sum for values that could overflow.
>         double logSum = 0;
>         // n! / (n - k)!
>         for (int i = n - k + 1; i <= n; i++) {
>             logSum += Math.log(i);
>         }
>         // Divide by k!
>         for (int i = 2; i <= k; i++) {
>             logSum -= Math.log(i);
>         }
>         return logSum;
> {code}
> The divide by k! should be summed separately and subtracted as a single sum. 
> This could be performed using the LogFactorial class which can cache results 
> for reuse.
> h2. Use the gamma functions
> Note: The combinatorics package depends on the gamma package. The factorial 
> can be computed using the gamma function:
> {noformat}
> n! = gamma(n+1)
> {noformat}
> This is used in the LogFactorial class which delegates to LogGamma.
> The binomial coefficient can be computed using the beta function:
> {noformat}
>    n!                gamma(n+1)               gamma(k+1) * gamma(n-k+1)
> ---------   = ------------------------- = 1 / -------------------------
> k! (n-k)!     gamma(k+1) * gamma(n-k+1)              gamma(n+1)
>             = 1 / (k * beta(k, n-k+1)) = 1 / ((n-k) * beta(k+1, n-k))
> beta(a, b) = gamma(a) * gamma(b) / gamma(a+b)
> {noformat}
> Note: In converting the Boost beta function implementation (NUMBERS-181) I 
> have tested using the beta function to compute the binomial coefficient. It 
> is not more accurate than using the current implementation within a loop to 
> compute the multiplication of terms.
> I have not tested using LogBeta to compute the LogBinomialCoefficient. The 
> accuracy of this should be checked against using the current summation of 
> logs within a loop.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to