I think your idea is an excellent candidate for scikit-learn-contrib

https://github.com/scikit-learn-contrib/scikit-learn-contrib

__________________________________________________________________________________________
Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and 
Capacity Planning
| 5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.sm...@macys.com

From: scikit-learn 
[mailto:scikit-learn-bounces+dale.t.smith=macys....@python.org] On Behalf Of 
Nicolas Goix
Sent: Thursday, May 26, 2016 8:51 AM
To: Scikit-learn user and developer mailing list
Subject: Re: [scikit-learn] partial_fit implementation for IsolationForest

⚠ EXT MSG:
Hello Isaak,

There is a paper from the same authors as iforest but for streaming data: 
http://ijcai.org/Proceedings/11/Papers/254.pdf

For now it is not cited enough (24) to satisfy the sklearn requirements. 
Waiting for more citations, this could be a nice addition to sklearn-contrib.

Otherwise, we could imagine extending iforest to streaming data by building new
trees when data come (and removing the oldest ones), prediction still being 
based on
the average depth of the forest. I'm not sure this heuristic could be merged on
scikit-learn, since it is not based on well-cited papers. In the same time,
it is a natural and simple extension of iforest to streaming data...

Any opinion on it?

Nicolas

2016-05-26 13:32 GMT+02:00 Arthur Mensch 
<arthur.men...@inria.fr<mailto:arthur.men...@inria.fr>>:

Hi Isaac,

You may have a look at MiniBatchKMeans and MiniBatchDictionaryLearning that 
both proposes this API. At the moment, you should fit a single mini batch to 
the estimator using partial_fit, and update the inner attributes accordingly. 
During the first partial_fit, you should take care of various memory allocation 
that are needed by the estimator.

Please fill free to create a pull request whenever you think your code is ready 
for review.

Good luck!
Le 26 mai 2016 13:14, 
<donkey-ho...@cryptolab.net<mailto:donkey-ho...@cryptolab.net>> a écrit :
hello scikit-learn devs,

After following the work on IsolationForest so far and testing on a real-world 
problem here we've found this model to be very promising for anomaly detection. 
However, at present, IsolationForest only fits data in batch even while it may 
be well suited to incremental on-line learning since one could subsample recent 
history and older estimators can be dropped progressively.

I'd like to contribute this feature, but being new to ML and scikit-learn I'm 
curious how I should start making a quick & dirty version to see how this may 
work. Are there other good examples where one could see the difference between 
.fit and .partial_fit in other models?

thanks
isaak y.
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org<mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org<mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn

* This is an EXTERNAL EMAIL. Stop and think before clicking a link or opening 
attachments.
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to