[Scikit-learn-general] Size of random forest model

David Broyles Tue, 08 Jan 2013 17:15:25 -0800

Hi,

I'm pretty new to scikit-learn.  I've generated a random forest
(classification) of 100 trees using default attributes.  My data set has
over 2M examples.


2 questions:

1) I've noticed the size of the pickled model is quite large (e.g. ~9GB).
 A comparable model trained with R's randomForest package is only about 40
GB (and randomForest defaults for tree complexity seem similar to
scikit's).  I don't believe randomForest is pruning the tree, but I could
be wrong.  Any ideas what may be causing this large a difference?

2) Let's say I want each tree in the forest to be built off of a 200k
sample from the 2M examples.  Does leaving the min_density at 0.1 achieve
this, or am I misunderstanding the role of this hyperparameter?

Thanks in advance for your help!

David

------------------------------------------------------------------------------
Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery
and much more. Keep your Java skills current with LearnJavaNow -
200+ hours of step-by-step video tutorials by Java experts.
SALE $49.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122612

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Size of random forest model

Reply via email to