[Rd] "simulate" does not include variability in parameter estimation

Spencer Graves Thu, 26 Dec 2019 20:15:23 -0800

Hello, All:

The default "simulate" method for lm and glm seems to ignore thesampling variance of the parameter estimates; see the trivial lm andglm examples below. Both these examples estimate a mean with formula =x~1. In both cases, the variance of the estimated mean is 1.

* In the lm example with x0 = c(-1, 1), var(x0) = 2, andvar(unlist(simulate(lm(x0~1), 10000, 1))) is 2.0064. Shouldn't it be 3= var(mean(x0)) + var(x0) = (2/2) + 2?

* In the glm example with x1=1,var(unlist(simulate(glm(x1~1, poisson), 10000, 1))) = 1.006. Shouldn'tit be 2 = var(glm estimate of the mean) + var(simulated Poissondistribution) = 1 + 1?

I'm asking, because I've recently written "simulate" methods forobjects of class stats::glm and BMA::bic.glm, where my primary interestwas simulating the predicted mean with "newdata". I'm doing this, so Ican get Monte Carlo prediction intervals. My current code for"simulate.glm" and "simulate.bic.glm" are available in the developmentversion of the "Ecfun" package on GitHub:



https://github.com/sbgraves237/Ecfun

Comparing my new code with "stats:::simulate.lm" raises thefollowing questions in my mind regarding "simulate" of a fit object:

1. Shouldn't "simulate" start by simulating the randomvariability in the estimated parameters? I need that for my currentapplication. If a generic "simulate" function should NOT include this,what should we call something that does include this? And how does thecurrent stats:::simulate.lm behavior fit with this?

2. Shouldn't "simulate" of a model fit include an optionfor "newdata"? I need that for my application.

3. By comparing with "predict.glm", I felt I needed anargument 'type = c("link", "response")'. "predict.glm" has an argument'type = c("link", "response", "terms")'. I didn't need "terms", so Ididn't take the time to code that. However, a general "simulate"function should probably include that.

4. My application involves assumed Poisson counts. I needto simulate those as well. If I combined those with "simulate.glm",what would I call them? I can't use the word "response", because that'salready used with a different meaning. Might "observations" be theappropriate term?



      What do you think?
      Thanks,
      Spencer Graves


> x0 <- c(-1, 1)
> var(x0)
[1] 2
> fit0 <- lm(x0~1)
> vcov(fit0)
            (Intercept)
(Intercept)           1
> sim0 <- simulate(fit0, 10000, 1)
> var(unlist(sim0))
[1] 2.006408
> x1 <- 1
> fit1 <- glm(x1~1, poisson)
> coef(fit1)
 (Intercept)
4.676016e-11
> exp(coef(fit1))
(Intercept)
          1
> vcov(fit1)
            (Intercept)
(Intercept)   0.9999903
> sim1 <- simulate(fit1, 10000, 1)
> var(unlist(sim1))
[1] 1.00617
> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.2

Matrix products: default

BLAS:/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylibLAPACK:/Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib


locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base

loaded via a namespace (and not attached):
[1] compiler_3.6.2 tools_3.6.2

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] "simulate" does not include variability in parameter estimation

Reply via email to