You are right.

I guess only C (in the case of linear SVM) is the best averaged over the fold. 
And once C is found, the weights over the whole training set are computed.

If that's the case, my proposal may be misleading.

Thank you,
Roberto


From: Andy [mailto:[email protected]]
Sent: Saturday, July 26, 2014 4:42 AM
To: [email protected]
Subject: Re: [Scikit-learn-general] gridSearchCV best_estimator_ best_score_

On 07/25/2014 10:30 PM, Pagliari, Roberto wrote:
Hi Andy,
Maybe it's just me, but the "left out data" threw me off. Perhaps, I would 
integrate with your previous comments:

best_estimator_

estimator

Estimator that was chosen by the search, i.e. estimator which gave highest 
average score (or smallest loss if specified) over the cross-validation folds. 
on the left out data.

best_score_

float

Highest average score of the best_estimator computed above on the left out data.


This is not entirely correct. The "best_estimator_" is retrained on the whole 
training set, while best_score_ is the average over folds.
I like your string for best_estimator_, but for best_score_ I would probably 
also say "Highest average score of the best parameter setting over 
cross-validation folds".

Pull request welcome. The current docstring warrants improvement I think ;)
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to