Offer whatever patches you think will help. On 29 November 2016 at 22:01, Daniel Homola <daniel.homol...@imperial.ac.uk> wrote:
> Sorry, should've done that. > > Thanks for the PR. To me it isn't the actual concept of nested CV that > needs more detailed explanation but the implementation in scikit-learn. > > I think it's not obvious at all for a newcomer (heck, I've been using it > for years on and off and even I got confused) that the clf GridSearch > object will carry it's inner CV object into the cross_val_score function, > which has it's own outer CV object. Unless you know that in scikit-learn > the CV object of an estimator is *NOT* overloaded with the > cross_val_score function's cv parameter, but rather it will result in a > nested CV, you simply cannot work out why this example works.. This is the > confusing bit I think.. Do you want me to add comments that highlight this > issue? > > > On 29/11/16 10:48, Joel Nothman wrote: > > Wait an hour for the docs to build and you won't get artifact not found :) > > If you'd looked at the PR diff, you'd see I've modified the description to > refer directly to GridSearchCV and cross_val_score: > > In the inner loop (here executed by GridSearchCV), the score is >> approximately maximized by fitting a model to each training set, and then >> directly maximized in selecting (hyper)parameters over the validation set. >> In the outer loop (here in cross_val_score), ... > > > Further comments in the code are welcome. > > On 29 November 2016 at 21:42, Albert Thomas <albertthoma...@gmail.com> > wrote: > >> I also get "artifact not found". And I agree with Daniel. >> >> Once you decompose what the code is doing you realize that it does the >> job. The simplicity of the code to perform nested cross validation using >> scikit learn objects is impressive but I guess it also makes it less >> obvious. So making the example clearer by explaining what the code does or >> by adding a few comments can be useful for others. >> >> Albert >> >> On Tue, 29 Nov 2016 at 11:19, Daniel Homola < >> daniel.homol...@imperial.ac.uk> wrote: >> >>> Hi Joel, >>> >>> Thanks a lot for the answer. >>> >>> "Each train/test split in cross_val_score holds out test data. >>> GridSearchCV then splits each train set into (inner-)train and validation >>> sets. " >>> >>> I know this is what nested CV supposed to do but the code is doing an >>> excellent job at obscuring this. I'll try and add some clarification in as >>> comments later today. >>> >>> Cheers, >>> >>> d >>> >>> >>> On 29/11/16 00:07, Joel Nothman wrote: >>> >>> If that clarifies, please offer changes to the example (as a pull >>> request) that make this clearer. >>> >>> On 29 November 2016 at 11:06, Joel Nothman <joel.noth...@gmail.com> >>> wrote: >>> >>> Briefly: >>> >>> clf = GridSearchCV >>> <http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV>(estimator=svr, >>> param_grid=p_grid, cv=inner_cv)nested_score = cross_val_score >>> <http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score>(clf, >>> X=X_iris, y=y_iris, cv=outer_cv) >>> >>> >>> Each train/test split in cross_val_score holds out test data. >>> GridSearchCV then splits each train set into (inner-)train and validation >>> sets. There is no leakage of test set knowledge from the outer loop into >>> the grid search optimisation; no leakage of validation set knowledge into >>> the SVR optimisation. The outer test data are reused as training data, but >>> within each split are only used to measure generalisation error. >>> >>> Is that clear? >>> >>> On 29 November 2016 at 10:30, Daniel Homola <dani.hom...@gmail.com> >>> wrote: >>> >>> Dear all, >>> >>> >>> I was wondering if the following example code is valid: >>> >>> http://scikit-learn.org/stable/auto_examples/model_selection >>> /plot_nested_cross_validation_iris.html >>> >>> My understanding is, that the point of nested cross-validation is to >>> prevent any data leakage from the inner grid-search/param optimization CV >>> loop into the outer model evaluation CV loop. This could be achieved if the >>> outer CV loop's test data is completely separated from the inner loop's CV, >>> as shown here: >>> >>> https://mlr-org.github.io/mlr-tutorial/release/html/img/nest >>> ed_resampling.png >>> >>> >>> The code in the above example however doesn't seem to achieve this in >>> any way. >>> >>> >>> Am I missing something here? >>> >>> >>> Thanks a lot, >>> >>> dh >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing >>> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ scikit-learn mailing >>> list scikit-learn@python.org https://mail.python.org/mailma >>> n/listinfo/scikit-learn >> >> _______________________________________________ scikit-learn mailing >> list scikit-learn@python.org https://mail.python.org/mailma >> n/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing > listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn