Dear community I have a dataframe and want to split it into a learn and a test partition. However the learnset should be balanced, i.e. each class should have the same number of cases. I tried and searched a lot, without success so far. Maybe you can help?
Some example code *# generate example data df <- data.frame(class = as.factor(sample(1:3, 20, replace = T)), var1 = rnorm(20,3), var2 = rnorm(20,6)) summary(df) # split into learn and test sets using the caret package require(caret) ind <- createDataPartition(df$class, p=.8, list = F, times = 1) # The problem is here: class sizes are not equal) learnset <- df[ind,] summary(learnset)* Version info: /> R.Version() $platform [1] "x86_64-pc-mingw32" $arch [1] "x86_64" $os [1] "mingw32" $system [1] "x86_64, mingw32" $major [1] "2" $minor [1] "15.1"/ -- View this message in context: http://r.789695.n4.nabble.com/sample-equal-number-of-cases-per-class-tp4648381.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

