Re: [Scikit-learn-general] Why do GridSearch and CrossValidation results differ?

Andreas Müller Fri, 27 Jul 2012 07:52:19 -0700

Hey.
I looked a bit at the code and I can not see a reason.
I thought it had something to do with "refit" but it doesn't seem so.
Does any one else have an idea?


This does seem weird.

Tobias, could you provide us with a minimalistic script and some data
to reproduce as a gist on github?

Cheers,
Andy

----- Ursprüngliche Mail -----
Von: "Tobias Günther" <[email protected]>
An: [email protected]
Gesendet: Mittwoch, 25. Juli 2012 21:49:46
Betreff: Re: [Scikit-learn-general] Why do GridSearch and CrossValidation 
results differ?


Hi Paolo, 
and thanks for your answer. 


I tried putting the seed to the parameters as you suggested, sadly it didn't 
change anything in the result. The results are also the same between multiple 
runs of the program, so I don't think that the problem lies there. 


There must be some difference in how the GridSearchCV and the CrossValidation 
work internally what we are missing. Maybe someone else has an idea? 


Best and thanks again, 
Tobias 



On Wed, Jul 25, 2012 at 10:07 PM, Paolo Losi < [email protected] > wrote: 


Hi Tobias, 



On Wed, Jul 25, 2012 at 9:52 PM, Tobias Günther < [email protected] > wrote: 




Hi everyone! 


I have the following question: I'm training an SGDClassifier and doing a 
GridSearch to find the best parameters. 
If I then use the "best parameters" found by the GridSearch and do a 
CrossValidation with the same folds I provided to the GridSearch I get 
different results than before. 
Also if I do CrossValidations with other parameter combinations the results 
differ from what is saved for the same combination in grid_search.grid_scores_ 
(the difference in the example provided below is not that much, but for other 
parameter combinations it differs greatly). 


I don't really understand why this is happening - shouldn't they be the same? 


Executive summary: 


try to add seed: [0] to parameters 


Longer explanation: 


Stochastic Gradient Descent has a random component in 
the algorithm that justifies some oscillation in the results. 
In the case you reported the oscillation is physiological 


By setting seed=0 you guarantee that the random number generator used for SGD 
is reset 
to the same for each training session. 


If you see even bigger variation between training runs it may indicate that you 
should increase 
n_iter since probably you have not reach convergence. 


Paolo 






------------------------------------------------------------------------------ 
Live Security Virtual Conference 
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ 
_______________________________________________ 
Scikit-learn-general mailing list 
[email protected] 
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Why do GridSearch and CrossValidation results differ?

Reply via email to