Re: [Scikit-learn-general] outlier measure random forest

Gilles Louppe Mon, 08 Sep 2014 08:18:07 -0700

Variants include:

- Taking into account common internal nodes reached by two samples. In
this sense, proximity takes into account the paths that are common and
not only the leaves.
- Normalizing the counts by the number of training samples within the
common leaves (instead of simply counting +1 for all common leaves).
Indeed, detecting that two samples belong to the same node may not be
a good proxy for their proximity if there are many other samples
within the same node. By contrast, if there are very few samples
reaching a common leaf, then this constitutes a better clue for the
proximity of the two samples.


On 8 September 2014 17:03, Mathieu Blondel <math...@mblondel.org> wrote:
>
>
> On Mon, Sep 8, 2014 at 11:55 PM, Gilles Louppe <g.lou...@gmail.com> wrote:
>>
>> I am rather -1 on making this a transform. There has many ways to come
>> up with proximity measures in forest -- In fact, I dont think
>> Breiman's is particularly well designed.
>
>
> I think this is actually an argument for non-inclusion in the scikit.
> Perhaps an example based on the one in your thesis would suffice.
>
> What other methods exist?
>
> M.

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] outlier measure random forest

Reply via email to