> > for personal reason I am writing a function to compute the outlier
> > measure from random forest
> > http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#
> > outliers
>
> > with a little more work I can include the function in the sklearn
> > random forest class.
>
> Do you have a guessstimate on the amount of code it would add to the
> codebase.
>
>
We need to include few lines of code. I think something around 15 lines.
I need to know how are we going to include this in the code. Should this
be a new class or a new method of the random forest class? If interested
you can send me a private email about that.
> Also, is there a canonical paper on this approach that we could read.
>
>
The method is relatively new. I am in manufacturing and here the most used
techniques are based on multivariate control chart.
I have found that in a lot of situation this method works better. I know
that it has already been used to detect network intrusion
http://scholar.google.com/scholar?hl=en&q=random+forest+outlier+detection&btnG=&as_sdt=1%2C5&as_sdtp=
> > Is the community interested? Should I do it?
>
> As always, it's very hard to judge whether a method should be included. I
> personnally think that outlier detection is something very important, and
> I'd like to see more in scikit-learn. However, we need to choose the
> methods that bring the most benefit to users to solve that problem. Thus
> we need to be convinved that the situations in which the method works
> well are reasonnably common. This requires understanding these
> situations, and that's usually a bit hard.
>
>
My impression is that at the moment the support for outlier detection in
sklearn is very poor. Basic techniques that are commonly used in my field
are not implemented. See for example multivariate control chart and self
organizing map.
This method has the advantage that can be quickly introduced with few lines
of code as a method of the random forest class
> Thanks a lot for that proposal!
>
> Ga?l
>
>
>
I use sklearn 8 hours a day so I am happy to help :-)
Just let me know.
Best,
Luca
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general