[R] sampling question

2007-06-28 Thread Kirsten Beyer
I am interested in locating a script to implement a sampling scheme
that would basically make it more likely that a particular observation
is chosen based on a weight associated with the observation.  I am
trying to select a sample of ~30 census blocks from each ZIP code area
based on the proportion of women in a ZCTA living in a particular
block.  I want to make it more likely that a block will be chosen if
the proportion of women in a patient's age group in a particular block
is high. Any ideas are appreciated!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling question

2007-06-28 Thread Greg Snow
The sample function has a prob argument that determines the
probabilities of each element being sampled, put your proportion of
women in there and see if that works for you.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Kirsten Beyer
 Sent: Thursday, June 28, 2007 2:00 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] sampling question
 
 I am interested in locating a script to implement a sampling 
 scheme that would basically make it more likely that a 
 particular observation is chosen based on a weight associated 
 with the observation.  I am trying to select a sample of ~30 
 census blocks from each ZIP code area based on the proportion 
 of women in a ZCTA living in a particular block.  I want to 
 make it more likely that a block will be chosen if the 
 proportion of women in a patient's age group in a particular 
 block is high. Any ideas are appreciated!
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling question

2007-06-28 Thread Adaikalavan Ramasamy
Lets assume your zcta data looks like this

set.seed(12345) ## temporary for reproducibility
zcta - data.frame( zipcode=LETTERS[1:5], prop=runif(5) )
zcta
zipcode  prop
1   A 0.7209039
2   B 0.8757732
3   C 0.7609823
4   D 0.8861246
5   E 0.4564810

This says that 72.1% of the population in zipcode A is female, ..., and 
45.6% in zipcode E is female.


Now suppose you sampled 20 people and you recorded the zipcode (and 
other variables) and stored in 'samp'

samp - data.frame( id=1:20,
zipcode=LETTERS[ sample(1:5, 20, replace=TRUE) ])


Now, I am not sure what you want to do. But I could see two possible 
meanings from your message.

1) If you want to sample 10 observation, with each observation weighted 
INDEPENDENTLY by the proportion of women in its zipcode, try something 
like the following. The problem with this option is that it depends on 
the prevalence of the zipcodes of the observations.

comb - merge( samp, zcta, all.x=T )
comb - comb[ order(comb$id), ]
comb[ sample( comb$id, 10, prob=comb$prop ), ]



2) If you want to sample x% in each zipcode, where x is the proportion 
of women in that zipcode. Then this is what I would call stratified 
sampling. Try this:

tmp - split( samp, samp$zipcode )
out - NULL

for( z in names(tmp) ){
   df - tmp[[z]]
   p  - zcta[ zcta$zipcode == z, prop ]
   out[[z]] - df[ sample( 1:nrow(df), p*nrow(df) ), ]
}
do.call(rbind, out)

You probably need a variant of these but if you need further help, you 
will need to provide more information and better yet examples.

Regards, Adai



Kirsten Beyer wrote:
 I am interested in locating a script to implement a sampling scheme
 that would basically make it more likely that a particular observation
 is chosen based on a weight associated with the observation.  I am
 trying to select a sample of ~30 census blocks from each ZIP code area
 based on the proportion of women in a ZCTA living in a particular
 block.  I want to make it more likely that a block will be chosen if
 the proportion of women in a patient's age group in a particular block
 is high. Any ideas are appreciated!
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.