Sorry, should've done that.
Thanks for the PR. To me it isn't the actual concept of nested CV that
needs more detailed explanation but the implementation in scikit-learn.
I think it's not obvious at all for a newcomer (heck, I've been using it
for years on and off and even I got confused) that the clf GridSearch
object will carry it's inner CV object into the cross_val_score
function, which has it's own outer CV object. Unless you know that in
scikit-learn the CV object of an estimator is *NOT* overloaded with the
cross_val_score function's cv parameter, but rather it will result in a
nested CV, you simply cannot work out why this example works.. This is
the confusing bit I think.. Do you want me to add comments that
highlight this issue?
On 29/11/16 10:48, Joel Nothman wrote:
Wait an hour for the docs to build and you won't get artifact not
found :)
If you'd looked at the PR diff, you'd see I've modified the
description to refer directly to GridSearchCV and cross_val_score:
In the inner loop (here executed by |GridSearchCV|), the score is
approximately maximized by fitting a model to each training set,
and then directly maximized in selecting (hyper)parameters over
the validation set. In the outer loop (here in |cross_val_score|), ...
Further comments in the code are welcome.
On 29 November 2016 at 21:42, Albert Thomas <albertthoma...@gmail.com
<mailto:albertthoma...@gmail.com>> wrote:
I also get "artifact not found". And I agree with Daniel.
Once you decompose what the code is doing you realize that it does
the job. The simplicity of the code to perform nested cross
validation using scikit learn objects is impressive but I guess it
also makes it less obvious. So making the example clearer by
explaining what the code does or by adding a few comments can be
useful for others.
Albert
On Tue, 29 Nov 2016 at 11:19, Daniel Homola
<daniel.homol...@imperial.ac.uk
<mailto:daniel.homol...@imperial.ac.uk>> wrote:
Hi Joel,
Thanks a lot for the answer.
"Each train/test split in cross_val_score holds out test data.
GridSearchCV then splits each train set into (inner-)train and
validation sets. "
I know this is what nested CV supposed to do but the code is
doing an excellent job at obscuring this. I'll try and add
some clarification in as comments later today.
Cheers,
d
On 29/11/16 00:07, Joel Nothman wrote:
If that clarifies, please offer changes to the example (as a
pull request) that make this clearer.
On 29 November 2016 at 11:06, Joel Nothman
<joel.noth...@gmail.com <mailto:joel.noth...@gmail.com>> wrote:
Briefly:
clf = GridSearchCV
<http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV>(estimator=svr,
param_grid=p_grid, cv=inner_cv)
nested_score = cross_val_score
<http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score>(clf,
X=X_iris, y=y_iris, cv=outer_cv)
Each train/test split in cross_val_score holds out test
data. GridSearchCV then splits each train set into
(inner-)train and validation sets. There is no leakage of
test set knowledge from the outer loop into the grid
search optimisation; no leakage of validation set
knowledge into the SVR optimisation. The outer test data
are reused as training data, but within each split are
only used to measure generalisation error.
Is that clear?
On 29 November 2016 at 10:30, Daniel Homola
<dani.hom...@gmail.com <mailto:dani.hom...@gmail.com>> wrote:
Dear all,
I was wondering if the following example code is valid:
http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html
<http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html>
My understanding is, that the point of nested
cross-validation is to prevent any data leakage from
the inner grid-search/param optimization CV loop into
the outer model evaluation CV loop. This could be
achieved if the outer CV loop's test data is
completely separated from the inner loop's CV, as
shown here:
https://mlr-org.github.io/mlr-tutorial/release/html/img/nested_resampling.png
<https://mlr-org.github.io/mlr-tutorial/release/html/img/nested_resampling.png>
The code in the above example however doesn't seem to
achieve this in any way.
Am I missing something here?
Thanks a lot,
dh
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________ scikit-learn
mailing list scikit-learn@python.org
<mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________ scikit-learn
mailing list scikit-learn@python.org
<mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn