I don't think it's a good way to train online forests as such an online estimator is not consistent (from a statistical standpoint [1]) and the users would not expect that:
assume you have a big dataset with million of samples and that you are calling partial_fit with a fixed chunk size of 1000 samples: the trees in the forest will never be able to have depth larger than 1000 while the hypothetical optimal tree that you would get by training the full dataset could potentially have millions of nodes in depth (hence capture much finer non-linear interactions between features). There exists a more complex implementation of online random forests that is actually provably consistent but would not be trivial to implement and is very new so maybe not yet suitable for inclusion in scikit-learn: http://arxiv.org/pdf/1302.4853.pdf [1] http://en.wikipedia.org/wiki/Consistency_%28statistics%29 -- Olivier ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general