Hi, I'm pretty new to scikit-learn. I've generated a random forest (classification) of 100 trees using default attributes. My data set has over 2M examples.
2 questions: 1) I've noticed the size of the pickled model is quite large (e.g. ~9GB). A comparable model trained with R's randomForest package is only about 40 GB (and randomForest defaults for tree complexity seem similar to scikit's). I don't believe randomForest is pruning the tree, but I could be wrong. Any ideas what may be causing this large a difference? 2) Let's say I want each tree in the forest to be built off of a 200k sample from the 2M examples. Does leaving the min_density at 0.1 achieve this, or am I misunderstanding the role of this hyperparameter? Thanks in advance for your help! David
------------------------------------------------------------------------------ Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery and much more. Keep your Java skills current with LearnJavaNow - 200+ hours of step-by-step video tutorials by Java experts. SALE $49.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122612
_______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
