On Mon, May 11, 2015 at 3:30 PM, Sebastian Raschka <se.rasc...@gmail.com>
wrote:
> Hi,
> I stumbled upon the brief note about nested cross-validation in the online
> documentation at
> http://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html#grid-search
> =====================
> Nested cross-validation
> >>>
> >>> cross_validation.cross_val_score(clf, X_digits, y_digits)
> ...
>
>
> array([ 0.938..., 0.963..., 0.944...])
> Two cross-validation loops are performed in parallel: one by the
> GridSearchCV estimator to set gamma and the other one bycross_val_score to
> measure the prediction performance of the estimator. The resulting scores
> are unbiased estimates of the prediction score on new data.
> =====================
>
> I am wondering how to "use" or "interpret" those scores. For example, if
> the gamma parameters are set differently in the inner loops, we accumulate
> test scores from the outer loops that would correspond to different models,
> and calculating the average performance from those scores wouldn't be a
> good idea? So, if the estimated parameters are different for the different
> inner folds, I would say that my model is not "stable" and varies a lot
> with respect to the chosen training fold.
>
> In general, what would speak against an approach to just split the initial
> dataset into train/test (70/30), perform grid search (via k-fold CV) on the
> training set, and evaluate the model performance on the test dataset?
>
Nothing, except that you are probably evaluating several parameter values.
Choosing the best one and reporting that one is overfitting because it uses
the test data to evaluate which parameter is best.
In the inner CV loop you do basically that: select the best model based on
evaluation on a test set. In order to evaluate the model's performance "at
best selected gamma" you then need to evaluate again on previously unseen
data.
This is automated in the mentioned cross_val_score + GridSearchCV loop, but
you can also do it by hand by splitting your data in 3 instead of 2.
>
> Best,
> Sebastian
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general