I have carefully read the CARET documentation at:
http://caret.r-forge.r-project.org/training.html, the vignettes, and
everything is quite clear (the examples on the website help a lot!), but I
am still a confused about the relationship between two arguments to
trainControl:

"method"
"index"

and the interplay between trainControl and the data splitting functions in
caret (e.g. createDataPartition, createResample, createFolds and
createMultiFolds)


To better frame my questions, let me use the following example from  the
documentation:
*************************************
data(BloodBrain)
set.seed(1)
tmp <- createDataPartition(logBBB,p = .8, times = 100)
trControl = trainControl(method = "LGOCV", index = tmp)
ctreeFit <- train(bbbDescr, logBBB, "ctree",trControl=trControl)
*************************************

My questions are:

1) If I use createDataPartition (which I assume that does stratified
bootstrapping), as in the above example, and I pass the result as index to
trainControl do I need to use LGOCV as the method in my call trainControl?
If I use another one (e.g. cv.) What difference would it make? In my head,
once you fix  index, you are fixing the type of cross-validation, so I am
not sure what role method plays if you use index.

2) What is the difference between createDataPartition and createResample?
Is it that createDataPartition does stratified bootstrapping, while
createResample doesn't?

3) How can I do **stratified** k-fold (e.g. 10 fold) cross validation using
caret? Would the following do it?

tmp <- createFolds(logBBB, k=10, list=TRUE,  times = 100)
trControl = trainControl(method = "cv", index = tmp)
ctreeFit <- train(bbbDescr, logBBB, "ctree",trControl=trControl)

Thanks so much in advance. CARET is a fantastic package and I am  eager to
learn how to use it properly.

~James

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to