Hello R-list, I'm attempting to migrate from Stata to R for my complex survey work. It has been straight-forward so far except for the following problem:
I have some code below, but first I'll describe the problem. When I compute predicted logits from a logistic regression, the standard errors of the predicted logits are way off (but the predicted logits are fine). Furthermore, the model logit coefficients have appropriate SEs. As a comparison, I ran the same model without the survey design; the predicted SEs come out fine. Here is example code (first no survey design model and predictions; then survey design model and predictions): > #MODEL COEF. ESTIMATES (NO SURVEY DESIGN) > model.l.nosvy <- glm(qn58~t8l,data=all.stratum,family=binomial) > summary(model.l.nosvy) Call: glm(formula = qn58 ~ t8l, family = binomial, data = all.stratum) Deviance Residuals: Min 1Q Median 3Q Max -1.310 -1.245 1.050 1.111 1.158 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.175890 0.006176 28.48 <2e-16 *** t8l -0.018643 0.001376 -13.55 <2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 145934 on 105857 degrees of freedom Residual deviance: 145750 on 105856 degrees of freedom AIC: 145754 Number of Fisher Scoring iterations: 3 > > #PREDICTED SEs > phat.l.se.logit.nosvy <- predict(model.l.nosvy,se=TRUE) > as.matrix(table(phat.l.se.logit.nosvy$se.fit)) [,1] 0.00632408017609573 14456 0.00633130215261306 15188 0.00741988836010757 12896 0.00743834214717549 10392 0.00923404822144662 13207 0.00925875968615561 15864 0.0114294663004145 12235 0.0114574202170594 11620 > > #MODEL COEF. ESTIMATES (SURVEY DESIGN) > model.l <- svyglm(qn58~t8l,design=all.svy,family=binomial) > summary(model.l) Call: svyglm(qn58 ~ t8l, design = all.svy, family = binomial) Survey design: svydesign(id = ~psu, strata = ~stratum, weights = ~weight, data = all.stratum, nest = T) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.016004 0.023267 -0.688 0.492 t8l -0.024496 0.004941 -4.958 1.13e-06 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 0.934964) Number of Fisher Scoring iterations: 2 > > #PREDICTED SEs > phat.l.logit.se <- predict(model.l,se=TRUE) > as.matrix(table(phat.l.logit.se$se.fit)) [,1] 2.04867522818685 15188 2.05533753780321 14456 2.39885304369985 10392 2.41588959524594 12896 2.98273190185571 15864 3.00556161422958 13207 3.69102305734136 11620 3.71685978156846 12235 #THESE SEs are too large. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.