Max and List, Could you advise me if I am using the proper caret syntax to carry out leave-one-out cross validation. In the example below, I use example data from the rda package. I use caret to tune over a grid and select an optimal value. I think I am then using the optimal selection for prediction. So there are two rounds of resampling with the first one taken care of by caret's train function.
My question overall is that it seems I must carry the outer resampling plan manually. On another note, I usually get the warning 1: In train.default(colon.x[-holdout, ], outcome[-holdout], method = "pam", : At least one of the class levels are not valid R variables names; This may cause errors if class probabilities are generated because the variables names will be converted to: X1, X2 2: executing %dopar% sequentially: no parallel backend registered When I change the variable names, caret gives me predictions as a numeric value corresponding to the ordered level. Have I missed something here? Thanks, Juliet # start example library(caret) # to obtain data library(rda) data(colon) # add colnames myind <- seq(1:ncol(colon.x)) mynames <- paste("A",myind,sep="") colnames(colon.x) <- mynames outcome <- factor(as.character(colon.y),levels=c("1","2")) cv_index <- 1:length(outcome) predictions <- rep(-1,length(cv_index)) pamGrid <- seq(0.1,5,by=0.2) pamGrid <- data.frame(.threshold=pamGrid) # manual leave-one-out for (holdout in cv_index) { pamFit1 <- train(colon.x[-holdout,], outcome[-holdout], method = "pam", tuneGrid= pamGrid, trControl = trainControl(method = "cv")) predictions[holdout] = predict(pamFit1,newdata = colon.x[holdout,,drop=FALSE]) } # end example > sessionInfo() R version 2.14.2 (2012-02-29) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] pamr_1.54 survival_2.36-12 e1071_1.6 class_7.3-3 [5] rda_1.0.2 caret_5.15-023 foreach_1.3.5 codetools_0.2-8 [9] iterators_1.0.5 cluster_1.14.2 reshape_0.8.4 plyr_1.7.1 [13] lattice_0.20-6 loaded via a namespace (and not attached): [1] compiler_2.14.2 grid_2.14.2 tools_2.14.2 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.