Hi,

I'm pretty new to scikit-learn.  I've generated a random forest
(classification) of 100 trees using default attributes.  My data set has
over 2M examples.

2 questions:

1) I've noticed the size of the pickled model is quite large (e.g. ~9GB).
 A comparable model trained with R's randomForest package is only about 40
GB (and randomForest defaults for tree complexity seem similar to
scikit's).  I don't believe randomForest is pruning the tree, but I could
be wrong.  Any ideas what may be causing this large a difference?

2) Let's say I want each tree in the forest to be built off of a 200k
sample from the 2M examples.  Does leaving the min_density at 0.1 achieve
this, or am I misunderstanding the role of this hyperparameter?

Thanks in advance for your help!

David
------------------------------------------------------------------------------
Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery
and much more. Keep your Java skills current with LearnJavaNow -
200+ hours of step-by-step video tutorials by Java experts.
SALE $49.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122612 
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to