Re: [scikit-learn] Problem with nested cross-validation example?

Albert Thomas Tue, 29 Nov 2016 02:44:24 -0800

I also get "artifact not found". And I agree with Daniel.

Once you decompose what the code is doing you realize that it does the job.
The simplicity of the code to perform nested cross validation using scikit
learn objects is impressive but I guess it also makes it less obvious. So
making the example clearer by explaining what the code does or by adding a
few comments can be useful for others.


Albert

On Tue, 29 Nov 2016 at 11:19, Daniel Homola <[email protected]>
wrote:

> Hi Joel,
>
> Thanks a lot for the answer.
>
> "Each train/test split in cross_val_score holds out test data.
> GridSearchCV then splits each train set into (inner-)train and validation
> sets. "
>
> I know this is what nested CV supposed to do but the code is doing an
> excellent job at obscuring this. I'll try and add some clarification in as
> comments later today.
>
> Cheers,
>
> d
>
>
> On 29/11/16 00:07, Joel Nothman wrote:
>
> If that clarifies, please offer changes to the example (as a pull request)
> that make this clearer.
>
> On 29 November 2016 at 11:06, Joel Nothman <[email protected]> wrote:
>
> Briefly:
>
> clf = GridSearchCV 
> <http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV>(estimator=svr,
>  param_grid=p_grid, cv=inner_cv)nested_score = cross_val_score 
> <http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score>(clf,
>  X=X_iris, y=y_iris, cv=outer_cv)
>
>
> Each train/test split in cross_val_score holds out test data. GridSearchCV
> then splits each train set into (inner-)train and validation sets. There is
> no leakage of test set knowledge from the outer loop into the grid search
> optimisation; no leakage of validation set knowledge into the SVR
> optimisation. The outer test data are reused as training data, but within
> each split are only used to measure generalisation error.
>
> Is that clear?
>
> On 29 November 2016 at 10:30, Daniel Homola <[email protected]> wrote:
>
> Dear all,
>
>
> I was wondering if the following example code is valid:
>
>
> http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html
>
> My understanding is, that the point of nested cross-validation is to
> prevent any data leakage from the inner grid-search/param optimization CV
> loop into the outer model evaluation CV loop. This could be achieved if the
> outer CV loop's test data is completely separated from the inner loop's CV,
> as shown here:
>
>
> https://mlr-org.github.io/mlr-tutorial/release/html/img/nested_resampling.png
>
>
> The code in the above example however doesn't seem to achieve this in any
> way.
>
>
> Am I missing something here?
>
>
> Thanks a lot,
>
> dh
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
>
>
> _______________________________________________
> scikit-learn mailing 
> [email protected]https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Problem with nested cross-validation example?

Reply via email to