On 27-Apr-04 Peter Dalgaard wrote: > (Ted Harding) <[EMAIL PROTECTED]> writes: >> The documentation does not say definitely what p$se.fit is, >> only calling it "Estimated standard errors". I *believe* >> this means, at each value of X, the SE in the estimation >> of P[y=1] taking account of the joint uncertainty in the >> estimation of 'a' and 'b' in the relation >> >> probit(P) = a + b*X >> >> Can someone confirm that this really is so? > > Pretty accurate, I'd say. > > Basically, the fitted value is a function of the estimated parameters. > Asymptotically, the latter are approximately normally distributed with > a small dispersion so that the function is effectively linear and you > can approximate the distribution of the fitted value with a normal > distribution.
Hmm!! This is a bit of a minefield! I've been fitting a binomial regression g <- glm(y~x, family=binomial(link=probit)) followed by a predicition p <- predict.glm(g, newx, type="response", se.fit=TRUE) Plotting the fit p$fit and the inplied "confidence bands" p$fit +- 2*p$se.fit gave rather narrow (I thought) bands for the prediction, so I did a simulation of 20 mvrnorm draws from the joint distribution of a and b (using their SDs and correlation from summary.glm). I then plotted the corresponding curves pnorm(a + b*x) and got a set of 20 curves about half of which lay well outside the above "condfidence band", some quite a long way off. (This, I must say, is what I had intuitively been expecting to find in the first place!) I then turned to the V&R book, and happened on the section "Problems with binomial GLMs" in 7.2; and discovered "profile". So this led me to "confint" in MASS via ?profile and ?profile.glm. The results of confint(g) gave me a and b for the lower (2.5%) and upper (97.5%) curve. When plotted, these curves lay well outside the "confidence bands" obtained from "predict.glm" and were much more realistically related to my simulations (19 of my simulated curves nicely packed the space between tthe two "confint" curves, and one lay just outside -- couldn't have hoped for a result more close to expectation!). Nevertheless, I don't think my data were all that few or nasty: 23 x-values, roughly equally spaced, with about 12 0/1 results at each, and numbers of responses going up as 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 2 4 5 8 8 9 4 10 So I tend to conclude that the "predict.glm(...,se.fit=TRUE)" method should perhaps be avoided in favour of using "confint", though I see no indication that "confint" respects the covariance of the parameter estimates (intercept and slope) whereas the "predict" method in theory does. Maybe I'll have another go, after centering the x-values at their mean ... Anyway, comments would be appreciated! Best wishe to all, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 167 1972 Date: 28-Apr-04 Time: 21:32:11 ------------------------------ XFMail ------------------------------ ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html