Hey. I looked a bit at the code and I can not see a reason. I thought it had something to do with "refit" but it doesn't seem so. Does any one else have an idea?
This does seem weird. Tobias, could you provide us with a minimalistic script and some data to reproduce as a gist on github? Cheers, Andy ----- Ursprüngliche Mail ----- Von: "Tobias Günther" <[email protected]> An: [email protected] Gesendet: Mittwoch, 25. Juli 2012 21:49:46 Betreff: Re: [Scikit-learn-general] Why do GridSearch and CrossValidation results differ? Hi Paolo, and thanks for your answer. I tried putting the seed to the parameters as you suggested, sadly it didn't change anything in the result. The results are also the same between multiple runs of the program, so I don't think that the problem lies there. There must be some difference in how the GridSearchCV and the CrossValidation work internally what we are missing. Maybe someone else has an idea? Best and thanks again, Tobias On Wed, Jul 25, 2012 at 10:07 PM, Paolo Losi < [email protected] > wrote: Hi Tobias, On Wed, Jul 25, 2012 at 9:52 PM, Tobias Günther < [email protected] > wrote: Hi everyone! I have the following question: I'm training an SGDClassifier and doing a GridSearch to find the best parameters. If I then use the "best parameters" found by the GridSearch and do a CrossValidation with the same folds I provided to the GridSearch I get different results than before. Also if I do CrossValidations with other parameter combinations the results differ from what is saved for the same combination in grid_search.grid_scores_ (the difference in the example provided below is not that much, but for other parameter combinations it differs greatly). I don't really understand why this is happening - shouldn't they be the same? Executive summary: try to add seed: [0] to parameters Longer explanation: Stochastic Gradient Descent has a random component in the algorithm that justifies some oscillation in the results. In the case you reported the oscillation is physiological By setting seed=0 you guarantee that the random number generator used for SGD is reset to the same for each training session. If you see even bigger variation between training runs it may indicate that you should increase n_iter since probably you have not reach convergence. Paolo ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
