Make sure you read and understand
http://scikit-learn.org/stable/modules/cross_validation.html. Basically,
getting the score of the final model on the full training data will be a
poor indication of how well the model will perform on other data. The
average of k folds where we have held out test data will be a much better
indication of how well the model will generalise to new instances.


On 29 July 2014 07:54, Pagliari, Roberto <[email protected]> wrote:

> Hi Joel,
>
> Just to make sure I understood.
>
>
>
> -          C is computed with cross validation, by finding the highest
> average score over the k folds
>
> -          Once C is found, weights are computed over the whole training
> set.
>
>
>
> If that’s the case, why is the best_score_ averaged over the k folds?
> Shouldn’t it be computed over the whole training set, since that’s the way
> the weights were determined?
>
>
>
> Thank you again for the clarification,
>
>
>
>
>
>
>
>
>
> *From:* Joel Nothman [mailto:[email protected]]
> *Sent:* Monday, July 28, 2014 10:32 AM
> *To:* scikit-learn-general
>
> *Subject:* Re: [Scikit-learn-general] gridSearchCV best_estimator_
> best_score_
>
>
>
> I do think you're right to attempt to improve it! Please submit a PR!
>
>
>
> On 29 July 2014 00:05, Pagliari, Roberto <[email protected]> wrote:
>
> You are right.
>
>
>
> I guess only C (in the case of linear SVM) is the best averaged over the
> fold. And once C is found, the weights over the whole training set are
> computed.
>
>
>
> If that’s the case, my proposal may be misleading.
>
>
>
> Thank you,
>
> Roberto
>
>
>
>
>
> *From:* Andy [mailto:[email protected]]
> *Sent:* Saturday, July 26, 2014 4:42 AM
>
>
> *To:* [email protected]
> *Subject:* Re: [Scikit-learn-general] gridSearchCV best_estimator_
> best_score_
>
>
>
> On 07/25/2014 10:30 PM, Pagliari, Roberto wrote:
>
> Hi Andy,
>
> Maybe it’s just me, but the ”left out data” threw me off. Perhaps, I would
> integrate with your previous comments:
>
>
>
> best_estimator_
>
> estimator
>
> Estimator that was chosen by the search, i.e. estimator which gave highest
> *average* score (or smallest loss if specified) *over the
> cross-validation folds*. on the left out data.
>
> best_score_
>
> float
>
> *Highest average score* of *the* best_estimator *computed above* on the
> left out data.
>
>
>
> This is not entirely correct. The "best_estimator_" is retrained on the
> whole training set, while best_score_ is the average over folds.
> I like your string for best_estimator_, but for best_score_ I would
> probably also say "Highest average score of the best parameter setting over
> cross-validation folds".
>
> Pull request welcome. The current docstring warrants improvement I think ;)
>
>
>
> ------------------------------------------------------------------------------
> Infragistics Professional
> Build stunning WinForms apps today!
> Reboot your WinForms applications with our WinForms controls.
> Build a bridge from your legacy apps to the future.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Infragistics Professional
> Build stunning WinForms apps today!
> Reboot your WinForms applications with our WinForms controls.
> Build a bridge from your legacy apps to the future.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to