Re: [Scikit-learn-general] Size of random forest model

2013-01-08 Thread Gilles Louppe
Hi David, On 9 January 2013 02:14, David Broyles wrote: > Hi, > > I'm pretty new to scikit-learn. I've generated a random forest > (classification) of 100 trees using default attributes. My data set has > over 2M examples. > > 2 questions: > > 1) I've noticed the size of the pickled model is qu

Re: [Scikit-learn-general] Size of random forest model

2013-01-08 Thread Gael Varoquaux
On Tue, Jan 08, 2013 at 05:14:52PM -0800, David Broyles wrote: > 1) I've noticed the size of the pickled model is quite large (e.g. ~9GB).  A > comparable model trained with R's randomForest package is only about 40 GB > (and > randomForest defaults for tree complexity seem similar to scikit's).

[Scikit-learn-general] Size of random forest model

2013-01-08 Thread David Broyles
Hi, I'm pretty new to scikit-learn. I've generated a random forest (classification) of 100 trees using default attributes. My data set has over 2M examples. 2 questions: 1) I've noticed the size of the pickled model is quite large (e.g. ~9GB). A comparable model trained with R's randomForest

Re: [Scikit-learn-general] GridSearch for Multilabel OneVsRestClassifier?

2013-01-08 Thread Andrew Winterman
Converting to a numpy array gave me a different and strange error message: /Users/andrewwinterman/Documents/sparks-honey/classifier/lib/python2.7/site-packages/sklearn/grid_search.pyc in fit_grid_point(X, y, base_clf, clf_params, train, test, loss_func, score_func, verbose, **fit_params) 109

Re: [Scikit-learn-general] GridSearch for Multilabel OneVsRestClassifier?

2013-01-08 Thread Andrew Winterman
X is a sparse matrix: X <926x1238 sparse matrix of type '' with 43973 stored elements in Compressed Sparse Row format> Y is a regular python list of 926 lists of strings: Y[0:10] [['29'], ['3', '24'], ['48'], ['29'], ['37'], ['3'], ['14'], ['21'], ['16', '48', '50'], ['48']]

Re: [Scikit-learn-general] GridSearch for Multilabel OneVsRestClassifier?

2013-01-08 Thread Andreas Mueller
On 01/09/2013 12:38 AM, Andrew Winterman wrote: I've also posted this question to Stack Overflow. I'm trying to use GridSearch for a multilabel problem with OneVsRestClassifier as follows. |#import

[Scikit-learn-general] GridSearch for Multilabel OneVsRestClassifier?

2013-01-08 Thread Andrew Winterman
I've also posted this question to Stack Overflow. I'm trying to use GridSearch for a multilabel problem with OneVsRestClassifier as follows. #importsfrom sklearn.svm import SVCfrom sklearn.pipeline import P

Re: [Scikit-learn-general] Nb of init centers != n_clusters: raise an error

2013-01-08 Thread Olivier Grisel
2013/1/8 Andreas Mueller : > Hi Gael. > Sounds good to me. +1 :) Yes please open a PR with a non regression test (+your fix obviously). -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel -- Master SQL Serv

Re: [Scikit-learn-general] Nb of init centers != n_clusters: raise an error

2013-01-08 Thread Andreas Mueller
Hi Gael. Sounds good to me. +1 :) Andy -- Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS and more. Get SQL Server skills now (including 2012) with LearnDevNow - 200+ hours of step-by-step video tuto

Re: [Scikit-learn-general] Nb of init centers != n_clusters: raise an error

2013-01-08 Thread Jaques Grobler
+1 2013/1/8 Gael Varoquaux > Hi list, > > I have just been debugging a nasty bug in the MiniBatchKMeans that was > created by passing a init function creating less centers than the > required nb of clusters. Currently the code just happily would run and > output a nb of clusters corresponding t

[Scikit-learn-general] Nb of init centers != n_clusters: raise an error

2013-01-08 Thread Gael Varoquaux
Hi list, I have just been debugging a nasty bug in the MiniBatchKMeans that was created by passing a init function creating less centers than the required nb of clusters. Currently the code just happily would run and output a nb of clusters corresponding to the nb of centers. I'd like the code to

[Scikit-learn-general] Nb of init centers != n_clusters: raise an error

2013-01-08 Thread Gael Varoquaux
Hi list, I have just been debugging a nasty bug in the MiniBatchKMeans that was created by passing a init function creating less centers than the required nb of clusters. Currently the code just happily would run and output a nb of clusters corresponding to the nb of centers. I'd like the code to

Re: [Scikit-learn-general] GridSearchCV does not work with SGDRegressor

2013-01-08 Thread Andreas Mueller
Please have a look at https://github.com/scikit-learn/scikit-learn/pull/1538 I will now get coffee If you have a minute, I'd love to have your feedback on https://github.com/scikit-learn/scikit-learn/pull/1518 and how I should proceed with the gradient boosting there.

Re: [Scikit-learn-general] GridSearchCV does not work with SGDRegressor

2013-01-08 Thread Peter Prettenhofer
great - thanks Andy! 2013/1/8 Andreas Mueller : > On 01/08/2013 09:57 AM, Andreas Mueller wrote: >> On 01/08/2013 09:49 AM, Ronnie Ghose wrote: >>> yay :) >> Sorry, I was to fast. that was not the problem :( D'oh. >> >> > yes it was. Double d'oh. I need to get some coffee, sorry > > --

Re: [Scikit-learn-general] GridSearchCV does not work with SGDRegressor

2013-01-08 Thread Andreas Mueller
On 01/08/2013 09:57 AM, Andreas Mueller wrote: > On 01/08/2013 09:49 AM, Ronnie Ghose wrote: >> yay :) > Sorry, I was to fast. that was not the problem :( D'oh. > > yes it was. Double d'oh. I need to get some coffee, sorry ---

Re: [Scikit-learn-general] GridSearchCV does not work with SGDRegressor

2013-01-08 Thread Andreas Mueller
On 01/08/2013 09:49 AM, Ronnie Ghose wrote: > yay :) Sorry, I was to fast. that was not the problem :( D'oh. -- Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS and more. Get SQL Server skills now (i

Re: [Scikit-learn-general] GridSearchCV does not work with SGDRegressor

2013-01-08 Thread Ronnie Ghose
yay :) On Tue, Jan 8, 2013 at 3:41 AM, Andreas Mueller wrote: > Pushed a fix :) > > > -- > Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS > and more. Get SQL Server skills now (including 2012) wit

Re: [Scikit-learn-general] GridSearchCV does not work with SGDRegressor

2013-01-08 Thread Andreas Mueller
Pushed a fix :) -- Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS and more. Get SQL Server skills now (including 2012) with LearnDevNow - 200+ hours of step-by-step video tutorials by Microsoft MVPs

Re: [Scikit-learn-general] GridSearchCV does not work with SGDRegressor

2013-01-08 Thread Andreas Mueller
The problem is that set_params doesn't return ``self.`` ?! On 01/08/2013 09:22 AM, Peter Prettenhofer wrote: > Hi Ronnie, Andy, > > thanks for reporting (and reproducing) - I'll investigate whats going > wrong and keep you posted. > > best, > Peter > > 2013/1/8 Andreas Mueller : >> I can reprodu

Re: [Scikit-learn-general] GridSearchCV does not work with SGDRegressor

2013-01-08 Thread Peter Prettenhofer
Hi Ronnie, Andy, thanks for reporting (and reproducing) - I'll investigate whats going wrong and keep you posted. best, Peter 2013/1/8 Andreas Mueller : > I can reproduce. That is weird. Like really weird... > > -- > Ma

Re: [Scikit-learn-general] GridSearchCV does not work with SGDRegressor

2013-01-08 Thread Andreas Mueller
I can reproduce. That is weird. Like really weird... -- Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS and more. Get SQL Server skills now (including 2012) with LearnDevNow - 200+ hours of step-by-s