Hi all, I have a dataset where each point is assigned to a class A, B, C, or D. Each point is also assigned to a study site. Each study site is coded with a number ranging between 1-100. This information is stored in the vector studySites.
I want to run randomForests using stratified sampling, so I chose the option strata = factor(studySites) But I am not sure how to control the number of samples taken from each study site. I tried to use 10 points from each study site: mySampSize = rep(10, 100) So my function call looks like: RF = randomForest(myClass~., data=myData, mtry=5, importance=TRUE, strata = factor(studySites), sampsize=mySampSize) But randomForest gives me the following error: Error in randomForest.default(m, y, ...) : sampsize can not be larger than class frequency Does anybody have any idea why this happens? Thank you very much, Naiara. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.