Hi Tobias,

On Wed, Jul 25, 2012 at 9:52 PM, Tobias Günther <[email protected]>wrote:

> Hi everyone!
>
> I have the following question: I'm training an SGDClassifier and doing a
> GridSearch to find the best parameters.
> If I then use the "best parameters" found by the GridSearch and do a
> CrossValidation with the same folds I provided to the GridSearch I get
> different results than before.
> Also if I do CrossValidations with other parameter combinations the
> results differ from what is saved for the same combination in
> grid_search.grid_scores_ (the difference in the example provided below is
> not that much, but for other parameter combinations it differs greatly).
>
> I don't really understand why this is happening - shouldn't they be the
> same?
>

Executive summary:

try to add seed: [0] to parameters

Longer explanation:

Stochastic Gradient Descent has a random component in
the algorithm that justifies some oscillation in the results.
In the case you reported the oscillation is physiological

By setting seed=0 you guarantee that the random number generator used for
SGD is reset
to the same for each training session.

If you see even bigger variation between training runs it may indicate that
you should increase
n_iter since probably you have not reach convergence.

Paolo
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to