Does caret have a bug calculating ROC with earth? When using caret and earth
on any of my data sets, caret's ROC never varies. This could mean earth is
finding the same model (for example, because of using an nprune parameter that
is too high). However, if that were true, sensitivity and specificity would
also not vary, but they do vary. Also, I verified nprune is not too high.
I am attaching sample output from R 2.14.0 on Windows 7 64-bit with earth 3.2
and caret 5.07.
I don't have this problem with caret and ctree.
Andrew
R version 2.14.0 (2011-10-31)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-mingw32/x64 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> # install and load packages, as needed
> for (pkg in c('caret','earth','mlbench', 'e1071')) {
+ if (!require(pkg, character.only=T)) {install.packages(pkg)}
+ require(pkg, character.only=T)
+ }
Loading required package: caret
Loading required package: lattice
Loading required package: reshape
Loading required package: plyr
Attaching package: reshape
The following object(s) are masked from package:plyr:
rename, round_any
Loading required package: cluster
Loading required package: foreach
Loading required package: iterators
Loading required package: codetools
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
Loading required package: earth
Loading required package: leaps
Loading required package: plotmo
Loading required package: plotrix
Loading required package: mlbench
Loading required package: e1071
Loading required package: class
Attaching package: class
The following object(s) are masked from package:reshape:
condense
>
> # system information
> installed.packages()[c('earth','caret'),'Version']
earth caret
"3.2-1" "5.07-001"
>
>
> # prepare data
> data(etitanic)
> mydata <- etitanic
> mydata$survived <- as.factor(ifelse(etitanic$survived==1, 'T', 'F'))
> summary(mydata)
pclass survived sex age sibsp parch
1st:284 F:619 female:388 Min. : 0.1667 Min. :0.0000 Min.
:0.0000
2nd:261 T:427 male :658 1st Qu.:21.0000 1st Qu.:0.0000 1st
Qu.:0.0000
3rd:501 Median :28.0000 Median :0.0000 Median
:0.0000
Mean :29.8811 Mean :0.5029 Mean
:0.4207
3rd Qu.:39.0000 3rd Qu.:1.0000 3rd
Qu.:1.0000
Max. :80.0000 Max. :8.0000 Max.
:6.0000
>
> # show natural maximum pruning is 9
> fit <- earth(survived ~ ., data=mydata)
> summary(fit, style="max")
Call: earth(formula=survived~., data=mydata)
T =
1.094732
- 0.2113713 * max(0, pclass2nd - 0)
- 0.3413489 * max(0, pclass3rd - 0)
- 0.4851343 * max(0, sexmale - 0)
- 0.004222467 * max(0, age - 10)
+ 0.02569032 * max(0, 10 - age)
- 0.09699376 * max(0, sibsp - 1)
- 0.06266133 * max(0, parch - 1)
- 0.09015484 * max(0, 1 - parch)
Selected 9 of 10 terms, and 6 of 6 predictors
Importance: sexmale, pclass3rd, age, pclass2nd, sibsp, parch
Number of terms at each degree of interaction: 1 8 (additive model)
GCV 0.1519922 RSS 153.8581 GRSq 0.3720351 RSq 0.3911174
>
> # custom metric
> twoClassSummaryPlus <- function (data,
+ lev = NULL,
+ model = NULL)
+
+ {
+ out1 <- twoClassSummary(data, lev, model)
+ out2 <- defaultSummary(data, lev, model)
+ #browser() # debug
+ #print(out1)
+ #print(dim(data))
+ c(out1, out2)
+ }
>
>
> # tne
> train_earth <- function(nprune)
+ {
+ # prepare tuning parameters
+ grid <- expand.grid(.degree=c(1), .nprune=nprune)
+
+ trControl<- trainControl(summaryFunction = twoClassSummaryPlus,
+ classProbs = T,
+ verboseIter=T)
+
+ # tune
+ mydata.best <- train(survived ~ .,
+ data = mydata,
+ method = "earth",
+ trControl = trControl,
+ metric="Sens",
+ tuneGrid=grid)
+
+ # show tuned
+ print(mydata.best)
+ }
>
> train_earth(c(1:9)) # ROC is constant
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Aggregating results
Selecting tuning parameters
Fitting model on full training set
1046 samples
6 predictors
2 classes: 'F', 'T'
No pre-processing
Resampling: Bootstrap (25 reps)
Summary of sample sizes: 1046, 1046, 1046, 1046, 1046, 1046, ...
Resampling results across tuning parameters:
nprune ROC Sens Spec Accuracy Kappa ROC SD Sens SD Spec SD
Accuracy SD Kappa SD
1 0.843 1 0 0.588 0 0.0154 0 0
0.0239 0
2 0.843 0.845 0.684 0.779 0.537 0.0154 0.0209 0.0318
0.0191 0.0393
3 0.843 0.845 0.685 0.779 0.537 0.0154 0.0217 0.0326
0.0191 0.0392
4 0.843 0.846 0.694 0.784 0.547 0.0154 0.0232 0.0343
0.0203 0.0412
5 0.843 0.842 0.714 0.789 0.561 0.0154 0.0236 0.0344
0.0184 0.037
6 0.843 0.848 0.718 0.794 0.57 0.0154 0.0222 0.0349
0.0182 0.0367
7 0.843 0.84 0.727 0.793 0.57 0.0154 0.0279 0.0357
0.0163 0.0324
8 0.843 0.84 0.723 0.792 0.567 0.0154 0.0276 0.0375
0.0161 0.0317
9 0.843 0.84 0.721 0.791 0.565 0.0154 0.026 0.0389
0.0161 0.0322
Tuning parameter 'degree' was held constant at a value of 1
Sens was used to select the optimal model using the largest value.
The final values used for the model were degree = 1 and nprune = 1.
There were 15 warnings (use warnings() to see them)
>
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.