Hey
You have the classical AD datasets from the KDD cup 99: SA, SF, http, smtp.
The original dataset has a high proportion of anomalies like 80%
(originally it was a classificaton task between different types of
intrusion).
SA is obtained by selecting all the normal data, and asmall proportion of
abnormal data to obtain an anomaly rate of 1%.
SF is obtained by picking up the data whose attribute logged_in is positive.
>From SF, you obtain http and smtp according to the 'service' attribute.
I think the original description of how to obtain these datasets is in
http://cs.fit.edu/~pkc/id/related/yamanishi-kdd00.pdf
>From the UCI repositery you have forestcover (normal class: 2 abnormal: 4),
shuttle (1 vs {2,3,4,6,7}).
Lots of other AD datasets exist, you typically take a classification
problem and choose two classes such that one has very small number of
samples. (choose a multi-class problem and take the largest and the
smallest class for instance). Having labeled data allows you to compute a
score.
You can also use a synthetic data generator, Mulcross.
All these datasets are used for instance in
http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf?q=isolation-forest
with descriptions and references.
Best,
Nicolas
2015-04-30 15:13 GMT+02:00 Luca Puggini <lucapug...@gmail.com>:
> Dear all,
> I was wondering if you can suggest me some typical dataset used to compare
> various anomaly detection methods
>
> Thanks a lot,
> Luca
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general