Hi Chris,

What you describe is rather a subsampling method (where trees are
built on subsets of the learning set) than a online random forests. In
my experience, subsampling is usually a very satisfying strategy,
giving results that are often as good as if trees had been built on
the entire training set.

For actual online random forests, see
http://homes.cs.washington.edu/~pedrod/papers/kdd00.pdf (and
references therein), in which the authors update the structure of the
trees as more data comes in.

Best,
Gilles

On 14 April 2014 17:28, Chris Spencer <chriss...@gmail.com> wrote:
> Would anyone be interested in an implementation of a ensemble decision-tree
> classifier that can be trained incrementally, or know why one is not already
> included in sklearn?
>
> I have a dataset that can be segmented very cleanly by a decision tree, but
> it's too large to fit into memory, so I couldn't directly apply the
> DecisionTreeClassifier or RandomForestClassifier, since they don't implement
> partial_fit().
>
> So I wrote a simple implementation that maintains a forest of
> DecisionTreeClassifier instances, training a new instance every time
> partial_fit() is called, keeping the N best according to the value returned
> by score(). The methods predict() and predict_proba() work similarly to
> those in the other ensemble tree classifiers.
>
> In my initial tests, performance and accuracy look good, but obviously it
> will likely never be more accurate than a single DecisionTreeClassifier when
> all your data can fit into memory. Does anyone know of any other potential
> caveats to this approach?
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to