When I was reading Sebastian's blog posts on Cross Validation a few weeks ago I also found the example of Nested cross validation on scikit-learn. At first like Daniel I thought the example was not doing what it should be doing. But after a few minutes I finally realized that it was correct. So I am for a bit more clarification.
Albert On Tue, 29 Nov 2016 at 02:53, Sebastian Raschka <se.rasc...@gmail.com> wrote: > On first glance, the image shown in the image and the code example seem to > do/show the same thing? Maybe it would be worth adding an explanatory > figure like this to the docs to clarify? > > > On Nov 28, 2016, at 7:07 PM, Joel Nothman <joel.noth...@gmail.com> > wrote: > > > > If that clarifies, please offer changes to the example (as a pull > request) that make this clearer. > > > > On 29 November 2016 at 11:06, Joel Nothman <joel.noth...@gmail.com> > wrote: > > Briefly: > > > > clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv) > > nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv) > > > > Each train/test split in cross_val_score holds out test data. > GridSearchCV then splits each train set into (inner-)train and validation > sets. There is no leakage of test set knowledge from the outer loop into > the grid search optimisation; no leakage of validation set knowledge into > the SVR optimisation. The outer test data are reused as training data, but > within each split are only used to measure generalisation error. > > > > Is that clear? > > > > On 29 November 2016 at 10:30, Daniel Homola <dani.hom...@gmail.com> > wrote: > > Dear all, > > > > I was wondering if the following example code is valid: > > > http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html > > > > My understanding is, that the point of nested cross-validation is to > prevent any data leakage from the inner grid-search/param optimization CV > loop into the outer model evaluation CV loop. This could be achieved if the > outer CV loop's test data is completely separated from the inner loop's CV, > as shown here: > > > https://mlr-org.github.io/mlr-tutorial/release/html/img/nested_resampling.png > > > > The code in the above example however doesn't seem to achieve this in > any way. > > > > Am I missing something here? > > > > Thanks a lot, > > dh > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn