Dear all,

This is the first time I am posting to the r-devel list. On StackOverflow, they suggested that the strange behaviour of integrate() was more bug-like. I am providing a short version of the question (full one with plots: https://stackoverflow.com/q/55639401).

Suppose one wants integrate a function that is just a product of two density functions (like gamma). The support of the random variable is (-Inf, 0]. The scale parameter of the distribution is quite small (around 0.01), so often, the standard integration routine would fail to integrate a function that is non-zero on a very small section of the negative line (like [-0.02, -0.01], where it takes huge values, and almost 0 everywhere else). R’s integrate would often return the machine epsilon as a result. So I stretch the function around the zero by an inverse of the scale parameter, compute the integral, and then divide it by the scale. Sometimes, this re-scaling also failed, so I did both if the first result was very small.

Today when integration of the rescaled function suddenly yielded a value of 1.5 instead of 3.5 (not even zero). The MWE is below.

cons <- -0.020374721416129591
sc <- 0.00271245601724757383
sh <- 5.704
f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh, scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab

curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n")
curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n")

sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 #  Checking by summation: 3.575294
sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294
str(integrate(f, -Inf, 0)) # Gives 3.575294
# $ value       : num 3.58
# $ abs.error   : num 1.71e-06
# $ subdivisions: int 10
str(integrate(f, -Inf, 0, numstab = sc))
# $ value       : num 1.5 # What?!
# $ abs.error   : num 0.000145 # What?!
# $ subdivisions: int 2

It stop at just two subdivisions! The problem is, I cannot try various stabilising multipliers for the function because I have to compute this integral thousands of times for thousands of parameter values on thousands of sample windows for hundreds on models, so even in the super-computer cluster, this takes weeks. Besides that, reducing the rel.tol just to 1e-5 or 1e-6, helped a bit, but I am not sure whether this guarantees success (and reducing it to 1e-7 slowed down the computations in some cases). And I have looked at the Fortran code of the quadrature just to see the integration rule, and was wondering.

How can I make sure that the integration routine will not produce such wrong results for such a function, and the integration will still be fast?

Yours sincerely,
Andreï V. Kostyrka

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to