Sorry, I misread what you wrote. Your suggested approach is perfectly find
and corresponds exactly to what would happen if you did the mentioned
cross_val_score + GridSearchCV on a train-test split of one 70-30 fold.
Doing it several times using e.g. an outer KFold just gives you several
scores to do some stats on.
On Mon, May 11, 2015 at 3:37 PM, Michael Eickenberg <
michael.eickenb...@gmail.com> wrote:
>
>
> On Mon, May 11, 2015 at 3:30 PM, Sebastian Raschka <se.rasc...@gmail.com>
> wrote:
>
>> Hi,
>> I stumbled upon the brief note about nested cross-validation in the
>> online documentation at
>> http://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html#grid-search
>> =====================
>> Nested cross-validation
>> >>>
>> >>> cross_validation.cross_val_score(clf, X_digits, y_digits)
>> ...
>>
>>
>> array([ 0.938..., 0.963..., 0.944...])
>> Two cross-validation loops are performed in parallel: one by the
>> GridSearchCV estimator to set gamma and the other one bycross_val_score to
>> measure the prediction performance of the estimator. The resulting scores
>> are unbiased estimates of the prediction score on new data.
>> =====================
>>
>> I am wondering how to "use" or "interpret" those scores. For example, if
>> the gamma parameters are set differently in the inner loops, we accumulate
>> test scores from the outer loops that would correspond to different models,
>> and calculating the average performance from those scores wouldn't be a
>> good idea? So, if the estimated parameters are different for the different
>> inner folds, I would say that my model is not "stable" and varies a lot
>> with respect to the chosen training fold.
>>
>> In general, what would speak against an approach to just split the
>> initial dataset into train/test (70/30), perform grid search (via k-fold
>> CV) on the training set, and evaluate the model performance on the test
>> dataset?
>>
>
> Nothing, except that you are probably evaluating several parameter values.
> Choosing the best one and reporting that one is overfitting because it uses
> the test data to evaluate which parameter is best.
>
> In the inner CV loop you do basically that: select the best model based on
> evaluation on a test set. In order to evaluate the model's performance "at
> best selected gamma" you then need to evaluate again on previously unseen
> data.
>
> This is automated in the mentioned cross_val_score + GridSearchCV loop,
> but you can also do it by hand by splitting your data in 3 instead of 2.
>
>
>>
>> Best,
>> Sebastian
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general