Thanks a lot,
this will be very useful.
On Sat, May 2, 2015 at 12:06 PM, Nicolas Goix <goix.nico...@gmail.com>
wrote:
> Hey
>
> You have the classical AD datasets from the KDD cup 99: SA, SF, http,
> smtp.
>
> The original dataset has a high proportion of anomalies like 80%
> (originally it was a classificaton task between different types of
> intrusion).
>
> SA is obtained by selecting all the normal data, and asmall proportion of
> abnormal data to obtain an anomaly rate of 1%.
>
> SF is obtained by picking up the data whose attribute logged_in is
> positive.
>
> From SF, you obtain http and smtp according to the ‘service’ attribute.
>
> I think the original description of how to obtain these datasets is in
>
> http://cs.fit.edu/~pkc/id/related/yamanishi-kdd00.pdf
>
>
> From the UCI repositery you have forestcover (normal class: 2 abnormal:
> 4), shuttle (1 vs {2,3,4,6,7}).
>
> Lots of other AD datasets exist, you typically take a classification
> problem and choose two classes such that one has very small number of
> samples. (choose a multi-class problem and take the largest and the
> smallest class for instance). Having labeled data allows you to compute a
> score.
>
>
> You can also use a synthetic data generator, Mulcross.
>
> All these datasets are used for instance in
> http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf?q=isolation-forest
> with descriptions and references.
>
> Best,
> Nicolas
>
>
> 2015-04-30 15:13 GMT+02:00 Luca Puggini <lucapug...@gmail.com>:
>
>> Dear all,
>> I was wondering if you can suggest me some typical dataset used to
>> compare various anomaly detection methods
>>
>> Thanks a lot,
>> Luca
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general