[Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Luca Puggini
Hi, for personal reason I am writing a function to compute the outlier measure from random forest http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#outliers with a little more work I can include the function in the sklearn random forest class. Is the community interested? Should I

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Gael Varoquaux
On Mon, Sep 08, 2014 at 10:05:58AM +0100, Luca Puggini wrote: for personal reason I am writing a function to compute the outlier measure from random forest http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm# outliers with a little more work I can include the function in the

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Gilles Louppe
Hi Luca, This may not be the fastest implementation, but random forest proximities can be computed quite straightforwardly in Python given our 'apply' function. See for instance https://github.com/glouppe/phd-thesis/blob/master/scripts/ch4_proximity.py#L12 From a personal point of view, I never

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Peter Prettenhofer
+1 -- looks like a very handy 3-liner :) 2014-09-08 16:14 GMT+02:00 Gilles Louppe g.lou...@gmail.com: Hi Luca, This may not be the fastest implementation, but random forest proximities can be computed quite straightforwardly in Python given our 'apply' function. See for instance

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Sam Nicholls
+1 for seeing this implemented. I feel it would be a useful addition for work we do here that involves use of random forests. On Mon, Sep 8, 2014 at 3:14 PM, Gilles Louppe g.lou...@gmail.com wrote: Hi Luca, This may not be the fastest implementation, but random forest proximities can be

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Mathieu Blondel
This could be a transform method added to RandomForestClassifier / RandomForestRegressor. On Mon, Sep 8, 2014 at 11:14 PM, Gilles Louppe g.lou...@gmail.com wrote: Hi Luca, This may not be the fastest implementation, but random forest proximities can be computed quite straightforwardly in

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Gael Varoquaux
On Mon, Sep 08, 2014 at 11:49:26PM +0900, Mathieu Blondel wrote: This could be a transform method added to RandomForestClassifier / RandomForestRegressor. I don't think that it can be a transform, because currently transform cannot modify y (and that's really a problem). G

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Gael Varoquaux
I don't think that it can be a transform, because currently transform cannot modify y (and that's really a problem). Brainfart! I hadn't thought about the problem well enough. Please disregard the previous message. G

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Gilles Louppe
I am rather -1 on making this a transform. There has many ways to come up with proximity measures in forest -- In fact, I dont think Breiman's is particularly well designed. On 8 September 2014 16:52, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Mon, Sep 08, 2014 at 11:49:26PM +0900,

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Mathieu Blondel
On Mon, Sep 8, 2014 at 11:55 PM, Gilles Louppe g.lou...@gmail.com wrote: I am rather -1 on making this a transform. There has many ways to come up with proximity measures in forest -- In fact, I dont think Breiman's is particularly well designed. I think this is actually an argument for

Re: [Scikit-learn-general] outlier measure random forest

2014-09-08 Thread Gilles Louppe
Variants include: - Taking into account common internal nodes reached by two samples. In this sense, proximity takes into account the paths that are common and not only the leaves. - Normalizing the counts by the number of training samples within the common leaves (instead of simply counting +1