I don't think it's a good way to train online forests as such an
online estimator is not consistent (from a statistical standpoint [1])
and the users would not expect that:

assume you have a big dataset with million of samples and that you are
calling partial_fit with a fixed chunk size of 1000 samples: the trees
in the forest will never be able to have depth larger than 1000 while
the hypothetical optimal tree that you would get by training the full
dataset could potentially have millions of nodes in depth (hence
capture much finer non-linear interactions between features).

There exists a more complex implementation of online random forests
that is actually provably consistent but would not be trivial to
implement and is very new so maybe not yet suitable for inclusion in
scikit-learn:

http://arxiv.org/pdf/1302.4853.pdf

[1] http://en.wikipedia.org/wiki/Consistency_%28statistics%29

-- 
Olivier

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to