Hi Joel,
That’s what I thought, but I got confused by a previous comment:
“This is not entirely correct. The "best_estimator_" is retrained on the whole
training set, while best_score_ is the average over folds.
I like your string for best_estimator_, but for best_score_ I would probably
also say "Highest average score of the best parameter setting over
cross-validation folds".”
So C is the value that provides the best average score over the k folds (and
best_score_ is the corresponding value returned by gridSearchCV).
So now the question is, how are the weights computed? Are they computed using
the whole training set (with C found earlier), or are they the averaged over
the k folds? This is not explicitly mentioned in the documentation.
I’m trying to understand what the text highlighted above means.
Thank you,
Roberto
From: Joel Nothman [mailto:[email protected]]
Sent: Monday, July 28, 2014 8:28 PM
To: scikit-learn-general
Subject: Re: [Scikit-learn-general] gridSearchCV best_estimator_ best_score_
Make sure you read and understand
http://scikit-learn.org/stable/modules/cross_validation.html. Basically,
getting the score of the final model on the full training data will be a poor
indication of how well the model will perform on other data. The average of k
folds where we have held out test data will be a much better indication of how
well the model will generalise to new instances.
On 29 July 2014 07:54, Pagliari, Roberto
<[email protected]<mailto:[email protected]>> wrote:
Hi Joel,
Just to make sure I understood.
- C is computed with cross validation, by finding the highest average
score over the k folds
- Once C is found, weights are computed over the whole training set.
If that’s the case, why is the best_score_ averaged over the k folds? Shouldn’t
it be computed over the whole training set, since that’s the way the weights
were determined?
Thank you again for the clarification,
From: Joel Nothman
[mailto:[email protected]<mailto:[email protected]>]
Sent: Monday, July 28, 2014 10:32 AM
To: scikit-learn-general
Subject: Re: [Scikit-learn-general] gridSearchCV best_estimator_ best_score_
I do think you're right to attempt to improve it! Please submit a PR!
On 29 July 2014 00:05, Pagliari, Roberto
<[email protected]<mailto:[email protected]>> wrote:
You are right.
I guess only C (in the case of linear SVM) is the best averaged over the fold.
And once C is found, the weights over the whole training set are computed.
If that’s the case, my proposal may be misleading.
Thank you,
Roberto
From: Andy [mailto:[email protected]<mailto:[email protected]>]
Sent: Saturday, July 26, 2014 4:42 AM
To:
[email protected]<mailto:[email protected]>
Subject: Re: [Scikit-learn-general] gridSearchCV best_estimator_ best_score_
On 07/25/2014 10:30 PM, Pagliari, Roberto wrote:
Hi Andy,
Maybe it’s just me, but the ”left out data” threw me off. Perhaps, I would
integrate with your previous comments:
best_estimator_
estimator
Estimator that was chosen by the search, i.e. estimator which gave highest
average score (or smallest loss if specified) over the cross-validation folds.
on the left out data.
best_score_
float
Highest average score of the best_estimator computed above on the left out data.
This is not entirely correct. The "best_estimator_" is retrained on the whole
training set, while best_score_ is the average over folds.
I like your string for best_estimator_, but for best_score_ I would probably
also say "Highest average score of the best parameter setting over
cross-validation folds".
Pull request welcome. The current docstring warrants improvement I think ;)
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general