Dear All, I have been trying to simulate data from a fitted glm using the simulate() function (version details at the bottom). This works for lm() fits and even for lmer() fits (in lme4). However, for glm() fits its output does not make sense to me -- am I missing something or is this a bug?
Consider the following count data, modelled as gaussian, poisson and binomial responses: counts <- data.frame( count = c(8, 3, 0, 3, 0, 9, 0, 11, 4, 7, 0, 0, 0, 4, 3, 6, 3, 15, 11, 9), batch = c(3.1, 3.1, 3.1, 3.3, 3.3, 3.3, 3.2, 3.12, 3.8, 3.11, 3.4, 3.4, 3.4, 3.4, 3.5, 3.5, 3.5, 3.5, 3.6, 3.6), weight = c(324.4, 372.5, 352.7, 379.6, 388.1, 431, 448.4, 377.3, 376.5, 358.4, 356, 351.4, 350.8, 332.1, 334.5, 392, 370.5, 409.7, 375, 318.5)) fit.gaus <- lm(sqrt(count) ~ weight, data=counts) simulate(fit.gaus)[[1]] # [1] 2.3280287 -0.7097561 2.5370403 2.3935569 -0.7918554 2.8043848 # [7] 1.7306636 4.2524854 0.3390859 1.4725342 1.1209236 0.1805066 # [13] 3.6348710 1.9740871 2.0315499 2.9240702 3.9282456 1.2174952 # [19] 0.1513106 0.6562989 fit.poisson <- glm(count ~ weight, family=poisson, data=counts) simulate(fit.poisson)[[1]] # [1] 2.11172412 5.68833210 4.32514109 4.93455140 8.14232584 5.05559585 # [7] 3.15849145 7.10833484 -0.24817021 4.32475164 -0.08093695 5.06006941 # [13] 7.38880304 6.47238760 9.53732036 2.59060450 5.10484831 3.21278865 # [19] -0.25775316 4.40702454 fit.binom <- glm(0!=count ~ weight, family=binomial, data=counts) simulate(fit.binom)[[1]] # [1] 0.47568732 1.26957295 0.38295877 -0.61108990 -1.03040311 0.73164463 # [7] -0.08117270 0.61211072 0.05559046 0.12979599 -0.29521387 1.76792413 # [13] 0.52791230 -1.10505180 -1.61753057 0.89083075 0.70100867 0.00962979 # [19] -0.43537313 0.38288809 For simulate() to have a consistent definition, shouldn't Poisson and binomial produce only positive integer responses? Below is a hacked together function that does what I would expect (note that my binomial works for only binary response glms): simulate.glm.hack <- function(fit, nsim=1, seed=NULL) { if (!is.null(seed)) set.seed(seed) n <- length(fitted(fit)) theta <- model.matrix(fit$terms, data=fit$data) %*% coef(fit) ymat <- matrix(NA, nrow=n, ncol=nsim) for (s in 1:nsim) { if ("poisson"==fit$family$family) { ymat[,s] <- rpois(n, fit$family$linkinv(theta)) } else if ("binomial"==fit$family$family) { ymat[,s] <- rbinom(n, 1, fit$family$linkinv(theta)) } } as.data.frame(ymat) } # Eg: simulate.glm.hack(fit.poisson)[[1]] # [1] 1 6 4 7 5 9 7 3 3 4 2 4 3 7 4 4 7 2 4 6 simulate.glm.hack(fit.binom)[[1]] # [1] 1 0 0 1 1 0 0 1 1 0 1 1 0 1 0 1 1 1 1 1 If I am missing something please direct me to the relevant documentation. Cheers, Will Valdar my version details... > sessionInfo() R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base" =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Dr William Valdar ++44 (0)1865 287 589 Wellcome Trust Centre [EMAIL PROTECTED] for Human Genetics, Oxford www.well.ox.ac.uk/~valdar ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.