Hi Isaak There is a good review on methods to do online random forests here:
https://arxiv.org/pdf/1302.4853.pdf In fact, it turns out that the method of having a "window" of trees is not the best way to do. Usually the trees have to be grown in the same time data arrive, see http://lrs.icg.tugraz.at/pubs/saffari_olcv_09.pdf Adapting ensembles API to online learning seems hard work. But you can open a PR to discuss it. Nicolas On 9 Jun 2016 9:06 am, <[email protected]> wrote: > hi nicolas, > excuse me, didn't mean to drop this thread for so long. > > There is a paper from the same authors as iforest but for streaming >> data: http://ijcai.org/Proceedings/11/Papers/254.pdf >> >> For now it is not cited enough (24) to satisfy the sklearn >> requirements. Waiting for more citations, this could be a nice >> addition to sklearn-contrib. >> > > agreed, I started on a weak implementation of hstree but it is not > scikit-learn compatible, > let's see what happens... > it would be nice to see some guidance here, maybe a new splitter will have > to be added? > > Otherwise, we could imagine extending iforest to streaming data by >> building new >> trees when data come (and removing the oldest ones), prediction still >> being based on >> the average depth of the forest. I'm not sure this heuristic could be >> merged on >> scikit-learn, since it is not based on well-cited papers. In the same >> time, >> it is a natural and simple extension of iforest to streaming data... >> >> Any opinion on it? >> > > It is, as I thought a simple extension - my first naive approach was to > use the 'warm_start' attribute > of the BaseBagging parent class to preserve older estimators and then, in > the 'partial_fit' method, we have a loop > which deleted popped off some n-number of estimators before calling the > original 'fit' method again on incoming data - > adding new estimators to the ensemble. > We run into the problem of concept drift. Is this the way you'd implement > this? if not, how would you approach? > > thanks so much for reading, > isaak > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
