Re: [scikit-learn] partial_fit implementation for IsolationForest

donkey-hotei Thu, 09 Jun 2016 06:06:07 -0700

hi nicolas,
excuse me, didn't mean to drop this thread for so long.

There is a paper from the same authors as iforest but for streaming
data: http://ijcai.org/Proceedings/11/Papers/254.pdf

For now it is not cited enough (24) to satisfy the sklearn
requirements. Waiting for more citations, this could be a nice
addition to sklearn-contrib.

agreed, I started on a weak implementation of hstree but it is notscikit-learn compatible,

let's see what happens...

it would be nice to see some guidance here, maybe a new splitter willhave to be added?

Otherwise, we could imagine extending iforest to streaming data by
building new
trees when data come (and removing the oldest ones), prediction still
being based on
the average depth of the forest. I'm not sure this heuristic could be
merged on
scikit-learn, since it is not based on well-cited papers. In the same
time,
it is a natural and simple extension of iforest to streaming data...

Any opinion on it?

It is, as I thought a simple extension - my first naive approach was touse the 'warm_start' attributeof the BaseBagging parent class to preserve older estimators and then,in the 'partial_fit' method, we have a loopwhich deleted popped off some n-number of estimators before calling theoriginal 'fit' method again on incoming data -

adding new estimators to the ensemble.

We run into the problem of concept drift. Is this the way you'dimplement this? if not, how would you approach?


thanks so much for reading,
isaak
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] partial_fit implementation for IsolationForest

Reply via email to