Hi Tobias,
On Wed, Jul 25, 2012 at 9:52 PM, Tobias Günther <[email protected]>wrote:
> Hi everyone!
>
> I have the following question: I'm training an SGDClassifier and doing a
> GridSearch to find the best parameters.
> If I then use the "best parameters" found by the GridSearch and do a
> CrossValidation with the same folds I provided to the GridSearch I get
> different results than before.
> Also if I do CrossValidations with other parameter combinations the
> results differ from what is saved for the same combination in
> grid_search.grid_scores_ (the difference in the example provided below is
> not that much, but for other parameter combinations it differs greatly).
>
> I don't really understand why this is happening - shouldn't they be the
> same?
>
Executive summary:
try to add seed: [0] to parameters
Longer explanation:
Stochastic Gradient Descent has a random component in
the algorithm that justifies some oscillation in the results.
In the case you reported the oscillation is physiological
By setting seed=0 you guarantee that the random number generator used for
SGD is reset
to the same for each training session.
If you see even bigger variation between training runs it may indicate that
you should increase
n_iter since probably you have not reach convergence.
Paolo
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general