Hi Joel,

Thanks a lot for the answer.

"Each train/test split in cross_val_score holds out test data. GridSearchCV then splits each train set into (inner-)train and validation sets. "

I know this is what nested CV supposed to do but the code is doing an excellent job at obscuring this. I'll try and add some clarification in as comments later today.

Cheers,

d


On 29/11/16 00:07, Joel Nothman wrote:
If that clarifies, please offer changes to the example (as a pull request) that make this clearer.

On 29 November 2016 at 11:06, Joel Nothman <joel.noth...@gmail.com <mailto:joel.noth...@gmail.com>> wrote:

    Briefly:

    clf  =  GridSearchCV
    
<http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV>(estimator=svr,
  param_grid=p_grid,  cv=inner_cv)
    nested_score  =  cross_val_score
    
<http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score>(clf,
  X=X_iris,  y=y_iris,  cv=outer_cv)


    Each train/test split in cross_val_score holds out test data.
    GridSearchCV then splits each train set into (inner-)train and
    validation sets. There is no leakage of test set knowledge from
    the outer loop into the grid search optimisation; no leakage of
    validation set knowledge into the SVR optimisation. The outer test
    data are reused as training data, but within each split are only
    used to measure generalisation error.

    Is that clear?

    On 29 November 2016 at 10:30, Daniel Homola <dani.hom...@gmail.com
    <mailto:dani.hom...@gmail.com>> wrote:

        Dear all,


        I was wondering if the following example code is valid:

        
http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html
        
<http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html>

        My understanding is, that the point of nested cross-validation
        is to prevent any data leakage from the
        inner grid-search/param optimization CV loop into the
        outer model evaluation CV loop. This could be achieved if the
        outer CV loop's test data is completely separated from the
        inner loop's CV, as shown here:

        
https://mlr-org.github.io/mlr-tutorial/release/html/img/nested_resampling.png
        
<https://mlr-org.github.io/mlr-tutorial/release/html/img/nested_resampling.png>


        The code in the above example however doesn't seem to achieve
        this in any way.


        Am I missing something here?


        Thanks a lot,

        dh


        _______________________________________________
        scikit-learn mailing list
        scikit-learn@python.org <mailto:scikit-learn@python.org>
        https://mail.python.org/mailman/listinfo/scikit-learn
        <https://mail.python.org/mailman/listinfo/scikit-learn>





_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to