Re: [R] sampling question

2007-06-28 Thread Adaikalavan Ramasamy
Lets assume your zcta data looks like this

set.seed(12345) ## temporary for reproducibility
zcta <- data.frame( zipcode=LETTERS[1:5], prop=runif(5) )
zcta
zipcode  prop
1   A 0.7209039
2   B 0.8757732
3   C 0.7609823
4   D 0.8861246
5   E 0.4564810

This says that 72.1% of the population in zipcode A is female, ..., and 
45.6% in zipcode E is female.


Now suppose you sampled 20 people and you recorded the zipcode (and 
other variables) and stored in 'samp'

samp <- data.frame( id=1:20,
zipcode=LETTERS[ sample(1:5, 20, replace=TRUE) ])


Now, I am not sure what you want to do. But I could see two possible 
meanings from your message.

1) If you want to sample 10 observation, with each observation weighted 
INDEPENDENTLY by the proportion of women in its zipcode, try something 
like the following. The problem with this option is that it depends on 
the prevalence of the zipcodes of the observations.

comb <- merge( samp, zcta, all.x=T )
comb <- comb[ order(comb$id), ]
comb[ sample( comb$id, 10, prob=comb$prop ), ]



2) If you want to sample x% in each zipcode, where x is the proportion 
of women in that zipcode. Then this is what I would call stratified 
sampling. Try this:

tmp <- split( samp, samp$zipcode )
out <- NULL

for( z in names(tmp) ){
   df <- tmp[[z]]
   p  <- zcta[ zcta$zipcode == z, "prop" ]
   out[[z]] <- df[ sample( 1:nrow(df), p*nrow(df) ), ]
}
do.call("rbind", out)

You probably need a variant of these but if you need further help, you 
will need to provide more information and better yet examples.

Regards, Adai



Kirsten Beyer wrote:
> I am interested in locating a script to implement a sampling scheme
> that would basically make it more likely that a particular observation
> is chosen based on a weight associated with the observation.  I am
> trying to select a sample of ~30 census blocks from each ZIP code area
> based on the proportion of women in a ZCTA living in a particular
> block.  I want to make it more likely that a block will be chosen if
> the proportion of women in a patient's age group in a particular block
> is high. Any ideas are appreciated!
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling question

2007-06-28 Thread Greg Snow
The sample function has a prob argument that determines the
probabilities of each element being sampled, put your proportion of
women in there and see if that works for you.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Kirsten Beyer
> Sent: Thursday, June 28, 2007 2:00 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] sampling question
> 
> I am interested in locating a script to implement a sampling 
> scheme that would basically make it more likely that a 
> particular observation is chosen based on a weight associated 
> with the observation.  I am trying to select a sample of ~30 
> census blocks from each ZIP code area based on the proportion 
> of women in a ZCTA living in a particular block.  I want to 
> make it more likely that a block will be chosen if the 
> proportion of women in a patient's age group in a particular 
> block is high. Any ideas are appreciated!
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sampling question

2007-06-28 Thread Kirsten Beyer
I am interested in locating a script to implement a sampling scheme
that would basically make it more likely that a particular observation
is chosen based on a weight associated with the observation.  I am
trying to select a sample of ~30 census blocks from each ZIP code area
based on the proportion of women in a ZCTA living in a particular
block.  I want to make it more likely that a block will be chosen if
the proportion of women in a patient's age group in a particular block
is high. Any ideas are appreciated!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.