Hi Nicholas,

I have some works on outlier detection, let me know how can I help, once
you start this project.

Thanks,

via Mobile.

Dayvid V.  R.  Oliveira
PhD candidate in Computer Science -  UFPE
MSc in Computer Science -  UFPE
Computer Engineer - UFPE
On Jun 23, 2014 12:52 PM, "Nicolas Goix" <goix.nico...@gmail.com> wrote:

> Hello,
>
> The following study evaluates on the DARPA 1998 data set four outlier
> detection algorithms :
>
> Unserpervised SVM, LOF approach, NN approach and Mahalanobis-based
> approach :
>
>
> http://static.msi.umn.edu/rreports/2003/72.pdf
>
>
> They find the LOF approach to be the more efficient, followed by the NN
> approach,
>
> unsupervised SVM and Mahalanobis-based approach.
>
>
>  However, like the other approaches involving a distance, the LOF
> algorithm does not
>
> scale well in high dimensional data, because of the effects of the data
> getting spread
>
> out sparsely (all the points become almost equidistant).
>
>
>  Aggarwal & Yu take this effect into consideration, and their
> evolutionary algorithm scales
>
> well in high dimensional data (and the complexity is almost linear with
> the dimension,
>
> and linear with the number of data).
>
>
>  It is the same for iForest, which doesn't rely on any distance.
> Furthermore, empirical
>
> evaluation of the authors shows that iForest outperforms ORCA, one-class
> SVM,
>
> and LOF in terms of AUC and processing times.
>
>
>  Regards,
>
> Nicolas
>
>
> 2014-06-20 13:37 GMT+02:00 Alexandre Gramfort <
> alexandre.gramf...@telecom-paristech.fr>:
>
>> hi,
>>
>> Nicolas, could you give some numbers on the impact of these different
>> works
>> to get an idea of which work might have the highest interest for the
>> sklearn community? do they all scale to medium or large datasets?
>>
>> is there anybody on the list with experience with these tools?
>>
>> Best,
>> Alex
>>
>>
>> On Fri, Jun 13, 2014 at 3:42 PM, Nicolas Goix <goix.nico...@gmail.com>
>> wrote:
>> > Hello,
>> >
>> > This is my first post to the list, I have been recently in touch with
>> > Alexandre Gramfort, and I would be very interested in exploring some
>> > outlier/anomaly detection algorithms, before eventually put it in a
>> > compatible scikit learn API (with a view to eventually merge it).
>> >
>> > I'm not particularly aware of the state-of-the-art in the efficience of
>> such
>> > algorithms, I have just read some surveys and other litterature on it,
>> and
>> > my conclusion is that exploring the following classical methods would be
>> > productive :
>> >
>> >
>> > - density-based algorithms : LOF (Local Outlier Factor) and its
>> variations
>> > (other algorithms using relative density/k-NN) such as COF
>> > (Connectivity-based Outlier Factor), ODIN (Outlier Detection using
>> Indegree
>> > Number), LOCI (Local Correlation Integral).
>> >
>> > LOF : http://www.dbs.ifi.lmu.de/Publikationen/Papers/LOF.pdf
>> >
>> > COF : http://www.cse.cuhk.edu.hk/~adafu/Pub/pakdd02.pdf
>> >
>> > ODIN : ftp://193.167.42.127/pub/franti/papers/Hautamaki/P2.pdf
>> >
>> > LOCI : http://www.dtic.mil/dtic/tr/fulltext/u2/a461085.pdf
>> >
>> >
>> > - high-dimensional approach :  « Aggarwal and Yu algorithm »
>> >
>> >
>> http://www.researchgate.net/publication/2401320_Outlier_Detection_for_High_Dimensional_Data/file/e0b49525c3e5f60b5e.pdf
>> >
>> >
>> > -  iForest (Isolation Forest), which seems very interesting because it
>> does
>> > not rely on any distance or density measure.
>> >
>> >
>> http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf?q=isolation
>> >
>> >
>> > So please let me know if some of these algorithms (or others) may
>> generate a
>> > particular interest.
>> >
>> > Anyway I'd be very glad to get any feedback on it.
>> >
>> >
>> > Cheers,
>> >
>> > Nicolas
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>> Solutions
>> > Find What Matters Most in Your Big Data with HPCC Systems
>> > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> > Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> > http://p.sf.net/sfu/hpccsystems
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > Scikit-learn-general@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>> Find What Matters Most in Your Big Data with HPCC Systems
>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> http://p.sf.net/sfu/hpccsystems
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to