Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-08 Thread Lars Buitinck
2013/7/8 Josh Wasserstein ribonucle...@gmail.com: Thank you Lars. I didn't see any deprecation warnings. Also, from what I can tell, one of the primary examples for model selection in the documentation uses cv as an argument to clf.fit: clf = GridSearchCV(SVC(C=1), tuned_parameters,

Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-05 Thread Lars Buitinck
2013/7/4 Josh Wasserstein ribonucle...@gmail.com: I am confused, what exactly is deprecated? Was there anything in the code I sent in my emails that is deprecated? Didn't you get a deprecation warning? -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam

Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-04 Thread Andreas Mueller
On 07/03/2013 09:54 PM, Vlad Niculae wrote: Also, it's not that GridSearch is sensitive in itself, but remember you're doing LeaveOneOut, so for every grid point you are actually doing `n_samples` calls to clf.fit. Maybe one of these calls is significantly slower than others due to scaling.

Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-04 Thread Lars Buitinck
2013/7/4 Andreas Mueller amuel...@ais.uni-bonn.de: Why is there a cv option to fit? That is deprecated, right? Turns out we forget this one. -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam -- This

Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-04 Thread Lars Buitinck
2013/7/4 Lars Buitinck l.j.buiti...@uva.nl: 2013/7/4 Andreas Mueller amuel...@ais.uni-bonn.de: Why is there a cv option to fit? That is deprecated, right? Turns out we forget this one. No, wait, I was jumping to conclusions. It is deprecated, to be removed in 0.15. fit parameters to

Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-04 Thread Andreas Mueller
On 07/04/2013 12:16 PM, Lars Buitinck wrote: 2013/7/4 Lars Buitinck l.j.buiti...@uva.nl: 2013/7/4 Andreas Mueller amuel...@ais.uni-bonn.de: Why is there a cv option to fit? That is deprecated, right? Turns out we forget this one. No, wait, I was jumping to conclusions. It is deprecated, to be

Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-04 Thread Andreas Mueller
On 07/04/2013 10:38 PM, Josh Wasserstein wrote: I am confused, what exactly is deprecated? Was there anything in the code I sent in my emails that is deprecated? Yes. Passing a cross-validation class to the fit method of grid search. This was just ignored. You should have passed it to

Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-03 Thread Josh Wasserstein
Thank you Vlad. I think you are right and there may be a problem with parallel jobs. When I run the code with the verbosity option enabled I see output coming out slowly. The strange thing is that doing a simple SVM fit is basically instantaneous (literally less than half a second), so I am not

Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-03 Thread Josh Wasserstein
Hmm, I noticed that if I run from sklearn import preprocessing X = preprocessing.scale(X) beforehand, it runs extremely fast! Why is that? Jacob On Wed, Jul 3, 2013 at 3:07 PM, Josh Wasserstein ribonucle...@gmail.comwrote: Thank you Vlad. I think you are right and there may be a problem

Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-03 Thread Josh Wasserstein
Perhaps more oddly, why is GridSearchCV so sensitive to it (note that a simple svm.SVC().fit(X,y) without scaling was already fast. In other words, it looks like scaling affects GridSearchCV in particular. Jacob On Wed, Jul 3, 2013 at 3:35 PM, Josh Wasserstein ribonucle...@gmail.comwrote:

Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-03 Thread Lars Buitinck
2013/7/3 Josh Wasserstein ribonucle...@gmail.com: Hmm, I noticed that if I run from sklearn import preprocessing X = preprocessing.scale(X) beforehand, it runs extremely fast! Why is that? Because support vector machines are quite sensitive to extreme feature values. You should always

Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-03 Thread Vlad Niculae
Also, it's not that GridSearch is sensitive in itself, but remember you're doing LeaveOneOut, so for every grid point you are actually doing `n_samples` calls to clf.fit. Maybe one of these calls is significantly slower than others due to scaling. On Wed, Jul 3, 2013 at 10:42 PM, Lars Buitinck

Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-03 Thread Olivier Grisel
2013/7/3 Josh Wasserstein ribonucle...@gmail.com: Perhaps more oddly, why is GridSearchCV so sensitive to it (note that a simple svm.SVC().fit(X,y) without scaling was already fast. In other words, it looks like scaling affects GridSearchCV in particular. According to your logs, it's slow