For the record, I was overlooking the obvious point that the statistical significance of the coefficient indicates only that the confidence interval for the *difference* in predicted probabilities will not overlap zero. It does not indicate that the two confidence intervals I was originally calculating will not overlap each other.
In case it's useful to others, the code below generates a confidence interval for the difference, on the response (probability) scale. - Malcolm # CI by SE: sapply(c(1.96,-1.96), function(a) plogis(sum(coef(mod1)) - a*summary(mod1)$coefficients[2,"Std. Error"]) - plogis(coef(mod1)[1])) # CI by simulation: quantile(plogis(sims%*%c(1,1)) - plogis(sims%*%c(1,0)), c(0, 0.025, 0.5, 0.975, 1)) On 3 May 2012, at 19:10, Bert Gunter wrote: > Your post suggests statistical confusion on several levels. But as > this concerns statistics, not R, consult your local statistician or > post on a statistical help list (e.g. stats.stackexchange.com). > > Incidentally, the short answer to your question about the overlap is: > why shouldn't they? You will hopefully receive a more informative > longer answer from the above suggested (or similar) resources. > > -- Bert > > > On Thu, May 3, 2012 at 9:37 AM, Malcolm Fairbrother > <m.fairbrot...@bristol.ac.uk> wrote: >> Dear list, >> >> I'm a bit perplexed why the 95% confidence bands for the predicted >> probabilities for units where x=0 and x=1 overlap in the following instance. >> >> I've simulated binary data to which I've then fitted a simple logistic >> regression model, with one covariate, and the coefficient on x is >> statistically significant at the 0.05 level. I've then used two different >> methods to generate 95% confidence bands for predicted probabilities, for >> each of two possible values of x. Given the result of the model, I expected >> the bands not to overlap… but they do. >> >> Can anyone please explain where I've gone wrong with the following code, OR >> why it makes sense for the bands to overlap, despite the statistically >> significant beta coefficient? There may be a good statistical reason for >> this, but I'm not aware of it. >> >> Many thanks, >> Malcolm Fairbrother >> >> >> n <- 120 >> set.seed(030512) >> x <- rbinom(n, 1, 0.5) >> dat <- within(data.frame(x), ybe <- rbinom(n, 1, plogis(-0.5 + x))) >> mod1 <- glm(ybe ~ x, dat, family=binomial) >> summary(mod1) # coefficient on x is statistically significant at the 0.05 >> level… almost at the 0.01 level >> >> pred <- predict(mod1, newdata=data.frame(x=c(0,1)), se.fit=T) >> with(pred, cbind(low = plogis(fit - 1.96*se.fit), est = plogis(fit), up = >> plogis(fit + 1.96*se.fit))) # confidence bands based on SEs >> >> # simulation-based confidence bands: >> sims <- t(replicate(200, coef(glm(simulate(mod1)$sim_1 ~ x, data=dat, >> family=binomial)))) >> pred0 <- plogis(quantile(sims%*%c(1,0), c(0.025, 0.5, 0.975))) >> pred1 <- plogis(quantile(sims%*%c(1,1), c(0.025, 0.5, 0.975))) >> rbind(pred0, pred1) >> >> # the upper bound of the prediction for x=0 is greater than the lower bound >> of the prediction for x=1, using both methods ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.