This makes me a little sad. Do Albert and Daniel think the explicit reference from blurb to code proposed at https://github.com/scikit-learn/scikit-learn/pull/7949 is a sufficient remedy? Otherwise could you please propose another clarifying change? Thanks.
On 29 November 2016 at 20:04, Albert Thomas <albertthoma...@gmail.com> wrote: > When I was reading Sebastian's blog posts on Cross Validation a few weeks > ago I also found the example of Nested cross validation on scikit-learn. At > first like Daniel I thought the example was not doing what it should be > doing. But after a few minutes I finally realized that it was correct. So I > am for a bit more clarification. > > Albert > > On Tue, 29 Nov 2016 at 02:53, Sebastian Raschka <se.rasc...@gmail.com> > wrote: > >> On first glance, the image shown in the image and the code example seem >> to do/show the same thing? Maybe it would be worth adding an explanatory >> figure like this to the docs to clarify? >> >> > On Nov 28, 2016, at 7:07 PM, Joel Nothman <joel.noth...@gmail.com> >> wrote: >> > >> > If that clarifies, please offer changes to the example (as a pull >> request) that make this clearer. >> > >> > On 29 November 2016 at 11:06, Joel Nothman <joel.noth...@gmail.com> >> wrote: >> > Briefly: >> > >> > clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv) >> > nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv) >> > >> > Each train/test split in cross_val_score holds out test data. >> GridSearchCV then splits each train set into (inner-)train and validation >> sets. There is no leakage of test set knowledge from the outer loop into >> the grid search optimisation; no leakage of validation set knowledge into >> the SVR optimisation. The outer test data are reused as training data, but >> within each split are only used to measure generalisation error. >> > >> > Is that clear? >> > >> > On 29 November 2016 at 10:30, Daniel Homola <dani.hom...@gmail.com> >> wrote: >> > Dear all, >> > >> > I was wondering if the following example code is valid: >> > http://scikit-learn.org/stable/auto_examples/model_ >> selection/plot_nested_cross_validation_iris.html >> > >> > My understanding is, that the point of nested cross-validation is to >> prevent any data leakage from the inner grid-search/param optimization CV >> loop into the outer model evaluation CV loop. This could be achieved if the >> outer CV loop's test data is completely separated from the inner loop's CV, >> as shown here: >> > https://mlr-org.github.io/mlr-tutorial/release/html/img/ >> nested_resampling.png >> > >> > The code in the above example however doesn't seem to achieve this in >> any way. >> > >> > Am I missing something here? >> > >> > Thanks a lot, >> > dh >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn