On 07-Aug-10 09:29:41, Michael Bedward wrote: > Thanks for that clarification Peter - much appreciated. > > Is there an R function that you'd recommend for calculating > more valid CIs ? > Michael
It depends on what you want to mean by "more valid"! If you have a 95% CI for the linear predictor (say L(x) at X=x), then the probability that the CI will include the true value of L(x) is 95% (more or less accurately, depending on what approximation, if any, was used to obtain the CI). Thus, if A(Y) and B(Y) are the lower and upper limits of a 95% CI for L(x) as functions of the data Y, P(A(Y) < L(x) < B(Y)) = 0.95 (to within approximation) and this may be asymmetrical in that we may have P((A(Y) > L(x)) = 1 - P(B(Y) < L(x)) != 0.025 (e.g. it may come out as a 1%:4% split of the 5%). The response probability P(Y=1 | X=x) will be a monotonic function F(L(x)) of x -- e.g. for the logistic exp(L)/(1+exp(L)), increasing from 0 to 1. Then {F(A(Y)), F(B(Y)} is a 95% CI for P = F(x), since P[A(Y) < L(x) < B(Y)] = P[F(A(Y)) < F(L(x))=P(x) < F(B(Y))]. Also, P[A(Y) > L(x)] = P(F(A(Y)) > F(L(x) = P(x)] and P[B(Y) < L(x)] = P(F(B(Y)) < F(L(x) = P(x)] for exactly the same reason (monotonicity of F). Hence the split of the 5% between left tail and right tail ion the response scale P(x) = F(L(x)) is exactly the same as the split on the linear predictor scale L(x). Therefore, on that front (comparison of probabilities of coverage), the CI transformed to the response scale {F(A(Y)), F(B(Y)} is exactly as valid as the CI {A(Y),B(Y)} on the original linear predictor scale. In particular, if the latter is "equi-tailed" (2.5% on either side) then the former will be too. If that is what you mean by "valid", then you're finished. However, possibly you may want "valid" to mean "extending to equal distances on either side of the point estimate" -- e.g. as you do with Estimate +/- 1.96*SE. It may be that, on the linear predictor scale, you achieve this and also equi-tailed (2.5% either way). But then, when you transform to the response scale, you will lose that symmetry: F(Est - 1.96*SE) and F(Est + 1.96*SE) will not be equidistant from F(Est) (though the equi-tailed 2.5%:2.5% of the tail probability will be preserved). If you have a reason for wanting to, you can start with a 95% CI for L(x) which is not equi-tailed, but does have the property of symmetry in the response scale: F(Est - 1.96*SE) and F(Est + 1.96*SE) will be equidistant from F(Est). So you could set up the CI for L(x) as {A(x) = Est - c0(x)*SE, B(x) = Est + c1(x)*SE} where c0(x) and c1(x) (which in general depend on x) are chosen so that you get symmetry on the response scale. But then you will lose the equi-tailed property on the linear predictor scale, hence also on the response scale. So you can't have everything at once, and it depends on what you want to mean by "valid"! However, in the case of response being the probability of Y=1, you might want to be careful about symmetry on the response scale, since that could result in a CI which goes above 1 or below 0, which would not be "valid" ... For large samples, asymptotically all these issues tend to dwindle into near-irrelevance, since locally the reponse is close to linear and whatever you achieve on one scale will be (close to) achieved on the other scale. Hoping this helps, Ted. > On 7 August 2010 18:37, Peter Dalgaard <pda...@gmail.com> wrote: >> >> Probably, neither is optimal, although any transformed scale is >> asymptotically equivalent. E.g., neither the probability scale >> nor the logit scale stabilizes the variance of a simple proportion >> (the arcsine transform does), so test-based CIs should really be >> asymmetric in both cases rather than just +/- 1.96se. >> >> However, working on the linear predictor scale has the advantage >> that CIs by definition will not cross the boundaries of the >> parameter space. (For the "usual" link functions: logit, probit, >> cloglog, that is; it's not true for the identity link, obviously.) >> -- >> Peter Dalgaard >> Center for Statistics, Copenhagen Business School >> Phone: (+45)38153501 >> Email: pd....@cbs.dk _Priv: pda...@gmail.com -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 07-Aug-10 Time: 11:42:09 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.