All: I need to generate confidence intervals for differences in proportions using data from a complex survey design. An example follows where I attempt to estimate the difference in depression prevalence by sex.
# Data might look something like this: Dfr<-data.frame(depression=sample(c("yes","no"), size=30, replace=TRUE), sex=sample(c("M","F"), size=30, replace=TRUE), cluster=rep(1:10, times=3), stratum=rep(1:5, each=2, times=3), pweight=runif(n=30, min=1, max=3)) Dfr library(survey) msdesign<-svydesign(id=~cluster, strata=~stratum, weights=~pweight, nest=TRUE, data=Dfr) # When searching online, one recommendation was to use svyglm() to generate an # approximation as follows: confint(with(Dfr, svyglm(I(depression=="yes")~sex, family=gaussian(link=identity), msdesign)), level=0.95, method="Wald") This question has been asked before on the listserv (circa 2007) and I contacted the original poster, who indicated that they never received a reply. Here is the question as described by the original poster: "I'm trying to get confidence intervals of proportions (sometimes for subgroups) estimated from complex survey data. Because a function like prop.test() does not exist for the "survey" package I tried the following: 1) Define a survey object (PSU of clustered sample, population weights); 2) Use svyglm() of the package "survey" to estimate a binary logistic regression (family='binomial'): For the confidence interval of a single proportion regress the binary dependent variable on a constant (1), for confidence intervals of that variable for subgroups regress this variable on the groups (factor) variable; 3) Use predict() to obtain estimated logits and the respective standard errors (mod.dat specifying either the constant or the subgroups): pred=predict(model,mod.dat,type='link',se.fit=T) and apply the following to obtain the proportion with its confidence intervals (for example, for conf.level=.95): lo.e = pred[1:length(pred)]-qnorm((1+conf.level)/2)*SE(pred) hi.e = pred[1:length(pred)]+qnorm((1+conf.level)/2)*SE(pred) prop = 1/(1+exp(-pred[1:length(pred)])) lo = 1/(1+exp(-lo.e)) hi = 1/(1+exp(-hi.e)) I think that in that way I get CI's based on asymptotic normality - either for a single proportion or split up into subgroups. Question: Is this a correct or a defensible procedure? Or should I use a different approach? Note that this approach should also allow to estimate CI's for proportions of subgroups taking into account the complex survey design." Thanks in advance for any help that you can provide. Tony ------------------------------------------------------------------------------ Tony N. Brown, Ph.D. Associate Chair and Associate Professor of Sociology Google Scholar Profile: http://tinyurl.com/lozlht8 LinkedIn Profile: https://www.linkedin.com/pub/tony-nicholas-brown/a6/64/31a ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.