I’ve been using Frank Harrell’s rms package to do bootstrap model validation. Is it the case that the optimum penalization may still give a model which is substantially overfitted?
I calculated corrected R^2, optimism in R^2, and corrected slope for various penalties for a simple example: x1 <- rnorm(45) x2 <- rnorm(45) x3 <- rnorm(45) y <- x1 + 2*x2 + rnorm(45,0,3) ols0 <- ols(y ~ x1 + x2 + x3, x=TRUE, y=TRUE) corrected.Rsq <- rep(0,60) optimism.Rsq <- rep(0,60) corrected.slope <- rep(0,60) for (pen in 1:60) { olspen <- ols(y ~ x1 + x2 + x3, penalty=pen, x=TRUE, y=TRUE) val <- validate(olspen, B=200) corrected.Rsq[pen] <- val["R-square", "index.corrected"] optimism.Rsq[pen] <- val["R-square", "optimism"] corrected.slope[pen] <- val["Slope", "index.corrected"] } plot(corrected.Rsq) x11(); plot(optimism.Rsq) x11(); plot(corrected.slope) p <- pentrace(ols0, penalty=1:60) ols9 <- ols(y ~ x1 + x2 + x3, penalty=9, x=TRUE, y=TRUE) validate(ols9, B=200) index.orig training test optimism index.corrected n R-square 0.2523404 0.2820734 0.1911878 0.09088563 0.1614548 200 MSE 7.8497722 7.3525300 8.4918212 -1.13929116 8.9890634 200 Intercept 0.0000000 0.0000000 -0.1353572 0.13535717 -0.1353572 200 Slope 1.0000000 1.0000000 1.1707137 -0.17071372 1.1707137 200 pentrace tells me that of the penalties 1, 2,..., 60, corrected AIC is maximised by a penalty of 9. This is consistent with the corrected R^2 plot, which shows a maximum somewhere around 10. However, a penalty of 9 still gives an R^2 optimism of 0.09 (training R^2=0.28, test R^2=0.19), suggesting overfitting. Do we just have to live with this R^2 optimism? It can be decreased by taking a bigger penalty, but then the corrected R^2 is reduced. Also, a penalty of 9 gives a corrected slope of about 1.17 (corrected slope of 1 is achieved with a penalty of about 1 or 2). Thanks for any help/advice you can give. Mark -- Mark Seeto Statistician National Acoustic Laboratories A Division of Australian Hearing ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.