On Thursday 18 June 2009, Jonathan Greenberg wrote: > Rers: > > What is the preferred library/function for doing stratified random > sampling from a dataset, given I want to control the number of samples > (rather than the proportion of samples) per strata? Thanks! > > --j
Hi Jonathan! Check out spsample in the 'sp' package for spatial-stratified random sampling, among others. For grouped data, there may be a function, but it should be as simple as: # some grouped data, with different means for clarity d <- data.frame(x=rnorm(1000, mean=c(1,5,10,15)), g=rep(letters[1:4], times=250)) # sample 2 items (without replacement) from each group: res <- by(d, d$g, function(i) {sample(i$x, size=2)} ) d$g: a [1] 0.1931319 2.1858605 ------------------------------------------------------------ d$g: b [1] 6.020904 5.200289 ------------------------------------------------------------ d$g: c [1] 9.61317 11.14428 ------------------------------------------------------------ d$g: d [1] 15.26022 14.61383 # Then, parse the result with lapply or sapply. Or, use the plyr framework to # extend this to multi-level stratification! library(lattice) library(plyr) # two-levels of grouped data: d <- data.frame(x=rnorm(1000, mean=c(1,5,100,150)), g=rep(letters[1:4], times=250), gg=rep(c('A','B'), each=2, times=250)) # check: bwplot(x ~ g | gg, data=d) # use ddply(): res <- ddply(d, .variables=c('gg','g'), .fun=function(i) { sample(i$x, size=2)} ) # result looks ok: gg g V1 V2 1 A a 0.1555472 3.196626 2 A b 4.9836106 5.559472 3 B c 100.0587593 101.723630 4 B d 150.7257066 149.865093 # might need some more work to convert that back into 'long format' for modeling... Cheers, Dylan -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.