Re: [Scikit-learn-general] [scikit-learn-general] Why sklearn RandomForest model take a lot of disk space after save?

2016-04-11 Thread Andreas Mueller
Which version of scikit-learn are you using? We recently (0.17) removed storing of data point indices in trees which greatly reduced the size in some cases. On 04/10/2016 09:28 AM, Piotr Płoński wrote: Thanks for comments! I put more details of my problem here http://stackoverflow.com/questio

Re: [Scikit-learn-general] [scikit-learn-general] Why sklearn RandomForest model take a lot of disk space after save?

2016-04-11 Thread Piotr Płoński
I am using 0.17.1, did you consider writing custom save methods for this classifier? 2016-04-11 18:11 GMT+02:00 Andreas Mueller : > Which version of scikit-learn are you using? > We recently (0.17) removed storing of data point indices in trees which > greatly reduced the size in some cases. > >

Re: [Scikit-learn-general] [scikit-learn-general] Why sklearn RandomForest model take a lot of disk space after save?

2016-04-11 Thread Sebastian Raschka
Just curious how it could be made more efficient. ~14.9 Mb for 50 trees on a 20 mb dataset doesn't sound too bad actually since we are not pruning the trees in Random Forests. Sth I could think would be to summarize similar trees in buckets or building a "fragment" library of shared decision rul

Re: [Scikit-learn-general] [scikit-learn-general] Why sklearn RandomForest model take a lot of disk space after save?

2016-04-11 Thread Joel Nothman
Yes, there are no doubt more efficient ways to store forests, but it seems unlikely to be a worthwhile investment. I think this is a documentation rather than an engineering issue. We frequently get issues raised that relate to "size": runtime, memory consumption, model size on disk, (in)effective

[Scikit-learn-general] sklearn Hackathon during ICML ?

2016-04-11 Thread Vighnesh Nandan Birodkar
Hello Everyone Me and Andreas were discussing the possibility of a scikit-learn hackathon during ICML this year. ICML is from June 19 - June 24. How many of you will be around for it ? And when would you prefer having the hackathon ? Before or after the conference ? Thanks Vighnesh -

Re: [Scikit-learn-general] sklearn Hackathon during ICML ?

2016-04-11 Thread Giorgio Patrini
Nice idea! I will be there. When picking the date, please consider that there are other AI conferences in NY all around the ICML period (UAI, COLT, IJCAI). Cheers, Giorgio On 12 Apr 2016, at 12:40 PM, Vighnesh Nandan Birodkar mailto:vighneshbirod...@nyu.edu>> wrote: Hello Everyone Me and A

Re: [Scikit-learn-general] sklearn Hackathon during ICML ?

2016-04-11 Thread Gael Varoquaux
Hi, Sorry, ICML is at the same dates as the big brain imaging conference, so I will not be able to attend (neither the conference, nor a sprint). Cheers, Gaël On Mon, Apr 11, 2016 at 10:40:56PM -0400, Vighnesh Nandan Birodkar wrote: > Hello Everyone > Me and Andreas were discussing the possibi