>
> On the other hand, if they change, one cannot really calculate the average
> performance from the outer KFold scores.
>
>
Why not? If one sees the GridSearchCV(simple_estimator) as "the best that
simple_estimator can do if we let it try several parameters", then
everything becomes consistent. You are basically testing how good
simple_estimator can be if you give it the chance to choose hyperparameters
using data. You are testing the validity of simple_estimator vs the
validity of simple_estimator(one_specific_parameter) in face of the data at
hand.
But that is theoretical. In practice, selecting e.g. a best penalty can be
a very noisy operation across folds, which is why some resort to model
averaging etc.
> On May 11, 2015, at 9:41 AM, Michael Eickenberg <
> michael.eickenb...@gmail.com> wrote:
>
> Sorry, I misread what you wrote. Your suggested approach is perfectly find
> and corresponds exactly to what would happen if you did the mentioned
> cross_val_score + GridSearchCV on a train-test split of one 70-30 fold.
> Doing it several times using e.g. an outer KFold just gives you several
> scores to do some stats on.
>
> On Mon, May 11, 2015 at 3:37 PM, Michael Eickenberg <
> michael.eickenb...@gmail.com> wrote:
>
>>
>>
>> On Mon, May 11, 2015 at 3:30 PM, Sebastian Raschka <se.rasc...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> I stumbled upon the brief note about nested cross-validation in the
>>> online documentation at
>>> http://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html#grid-search
>>> =====================
>>> Nested cross-validation
>>> >>>
>>> >>> cross_validation.cross_val_score(clf, X_digits, y_digits)
>>> ...
>>>
>>>
>>> array([ 0.938..., 0.963..., 0.944...])
>>> Two cross-validation loops are performed in parallel: one by the
>>> GridSearchCV estimator to set gamma and the other one bycross_val_score to
>>> measure the prediction performance of the estimator. The resulting scores
>>> are unbiased estimates of the prediction score on new data.
>>> =====================
>>>
>>> I am wondering how to "use" or "interpret" those scores. For example, if
>>> the gamma parameters are set differently in the inner loops, we accumulate
>>> test scores from the outer loops that would correspond to different models,
>>> and calculating the average performance from those scores wouldn't be a
>>> good idea? So, if the estimated parameters are different for the different
>>> inner folds, I would say that my model is not "stable" and varies a lot
>>> with respect to the chosen training fold.
>>>
>>> In general, what would speak against an approach to just split the
>>> initial dataset into train/test (70/30), perform grid search (via k-fold
>>> CV) on the training set, and evaluate the model performance on the test
>>> dataset?
>>>
>>
>> Nothing, except that you are probably evaluating several parameter
>> values. Choosing the best one and reporting that one is overfitting because
>> it uses the test data to evaluate which parameter is best.
>>
>> In the inner CV loop you do basically that: select the best model based
>> on evaluation on a test set. In order to evaluate the model's performance
>> "at best selected gamma" you then need to evaluate again on previously
>> unseen data.
>>
>> This is automated in the mentioned cross_val_score + GridSearchCV loop,
>> but you can also do it by hand by splitting your data in 3 instead of 2.
>>
>>
>>>
>>> Best,
>>> Sebastian
>>>
>>> ------------------------------------------------------------------------------
>>> One dashboard for servers and applications across Physical-Virtual-Cloud
>>> Widest out-of-the-box monitoring support with 50+ applications
>>> Performance metrics, stats and reports that give you Actionable Insights
>>> Deep dive visibility with transaction tracing using APM Insight.
>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
>
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general