Re: [R] Survey package and NAMCS data... unsure of specification

2005-10-05 Thread David L. Van Brunt, Ph.D.
Thanks! That's what I had come up with, but was unsure about it. I'm
checking the marginals against the published manuals now. Nice to have fresh
eyes on the problem! Thanks, also, for the helpful link.

On 10/4/05, Thomas Lumley [EMAIL PROTECTED] wrote:

 On Tue, 4 Oct 2005, David L. Van Brunt, Ph.D. wrote:

  Hello, all.
 
  I wanted to use the survey package to analyze data from the National
  Ambulatory Medical Care Survey, and am having some difficulty
 translating
  the analysis keywords from one package (Stata) to the other (R). The
 data
  were collected using a multistage probability sampling, and there are
  variables included to identify the sampling units and weights.
 Documentation
  from the NAMCS describes this for Stata as follows (note the variable
 names
  in the data are in caps):
 
  The pweight (PATWT), strata (CSTRATM), and PSU (CPSUM) are set with the
  svyset command as
  follows:
  svyset pweight PATWT
  svyset strata CSTRATM
  svyset psu CPSUM
 

 Supposing your data frame is called 'namcs'

 dnamcs - svydesign(id=~CPSUM, strata=~CSTRATM, weight=~PATWT, data=namcs)

 or perhaps

 dnamcs - svydesign(id=~CPSUM, strata=~CSTRATM, weight=~PATWT,
 data=namcs, nest=TRUE)

 (nest=TRUE is needed if CPSUM repeats the same values in different
 strata).

 Also, if you have access to design variables for the multistage design you
 can use them (but it probably won't make much difference). There's a very
 brief example using the National Health Interview Study at
 http://faculty.washington.edu/tlumley/survey/example-twostage.html


 -thomas




--
---
David L. Van Brunt, Ph.D.
mailto:[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Survey package and NAMCS data... unsure of specification

2005-10-04 Thread David L. Van Brunt, Ph.D.
Hello, all.

I wanted to use the survey package to analyze data from the National
Ambulatory Medical Care Survey, and am having some difficulty translating
the analysis keywords from one package (Stata) to the other (R). The data
were collected using a multistage probability sampling, and there are
variables included to identify the sampling units and weights. Documentation
from the NAMCS describes this for Stata as follows (note the variable names
in the data are in caps):

The pweight (PATWT), strata (CSTRATM), and PSU (CPSUM) are set with the
svyset command as
follows:
svyset pweight PATWT
svyset strata CSTRATM
svyset psu CPSUM


They provide similar instructions for SUDAAN: as
SUDAAN 1-stage WR Option
The program below provides a with replacement ultimate cluster (1-stage)
estimate of standard errors for a
cross-tabulation.
PROC CROSSTAB DATA=COMB1 DESIGN=WR FILETYPE=SAS;
NEST CSTRATM CPSUM/MISSUNIT;

In R, the svydesign command is used to set the sampling scheme, but as
follows (example from the documentation):

dstrat - svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)

 stratified on stype, with sampling weights pw. The fpc variable contains
the population size for the stratum. As the schools are sampled
independently, each record in the data frame is a separate PSU. This is
indicated by id=~1. Since the sampling weights could have been determined
from the population size an equivalent declaration would be

dstrat - svydesign(id=~1,strata=~stype,  data=apistrat, fpc=~fpc)

I get that the weights should be PATWT, and it seems that the strata
should be CSTRATM, but I'm unsure of how to handle the primary sampling
units (CPSUM).

Does anyone have any suggestions?

--
---
David L. Van Brunt, Ph.D.
mailto:[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Survey package and NAMCS data... unsure of specification

2005-10-04 Thread Thomas Lumley
On Tue, 4 Oct 2005, David L. Van Brunt, Ph.D. wrote:

 Hello, all.

 I wanted to use the survey package to analyze data from the National
 Ambulatory Medical Care Survey, and am having some difficulty translating
 the analysis keywords from one package (Stata) to the other (R). The data
 were collected using a multistage probability sampling, and there are
 variables included to identify the sampling units and weights. Documentation
 from the NAMCS describes this for Stata as follows (note the variable names
 in the data are in caps):

 The pweight (PATWT), strata (CSTRATM), and PSU (CPSUM) are set with the
 svyset command as
 follows:
 svyset pweight PATWT
 svyset strata CSTRATM
 svyset psu CPSUM


Supposing your data frame is called 'namcs'

dnamcs - svydesign(id=~CPSUM, strata=~CSTRATM, weight=~PATWT, data=namcs)

or perhaps

dnamcs - svydesign(id=~CPSUM, strata=~CSTRATM, weight=~PATWT,
   data=namcs, nest=TRUE)

(nest=TRUE is needed if CPSUM repeats the same values in different 
strata).

Also, if you have access to design variables for the multistage design you 
can use them (but it probably won't make much difference). There's a very 
brief example using the National Health Interview Study at
  http://faculty.washington.edu/tlumley/survey/example-twostage.html


-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html