Try this. ptrain and ptest and proportions in the training and test samples. The next line generates a random test vector of factors, f, for testing purposes.
ptrain <- 0.3; ptest <- 0.2 set.seed(1); f <- cut(runif(100),3,lab=F) first <- function(x, p) x[seq( ceiling(p * length(x) ) )] perms <- lapply(split( seq(f), f ), sample) train <- lapply( perms, function(x) first(x, ptrain) ) test <- lapply( perms, function(x) first(rev(x), ptest) ) first takes a vector and a proportion and returns that proportion of elements from the beginning of the vector. Assuming p > 0 it always returns at least one. perms is a random permutation of the cases at each level. Finally, in the last two statements, we take elements off the beginning of the permutations for our training set and off the end for our test set. At the end, train and test are each lists of vectors of case numbers representing the training and testing samples. --- Date: Fri, 20 Feb 2004 18:55:51 -0800 From: Jonathan Greenberg <[EMAIL PROTECTED]> To: R-help <[EMAIL PROTECTED]> Subject: [R] Stratified random sampling in R? Is there an easy way to do a stratified random sampling based on a factor column in R? E.g. I want to extract a random 10% of the data from dataset for each class (so each class may have a different number of entries, depending on its size). On a related note, if this is easily doable, is there an easy way to extract TWO non-overlapping strat. random samples datasets (e.g. If I want to have a training and test dataset). Thanks! -- Jonathan Greenberg Graduate Group in Ecology, U.C. Davis http://www.cstars.ucdavis.edu/~jongreen http://www.cstars.ucdavis.edu AIM: jgrn307 or jgrn3007 MSN: [EMAIL PROTECTED] or [EMAIL PROTECTED] ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
