Variants include: - Taking into account common internal nodes reached by two samples. In this sense, proximity takes into account the paths that are common and not only the leaves. - Normalizing the counts by the number of training samples within the common leaves (instead of simply counting +1 for all common leaves). Indeed, detecting that two samples belong to the same node may not be a good proxy for their proximity if there are many other samples within the same node. By contrast, if there are very few samples reaching a common leaf, then this constitutes a better clue for the proximity of the two samples.
On 8 September 2014 17:03, Mathieu Blondel <math...@mblondel.org> wrote: > > > On Mon, Sep 8, 2014 at 11:55 PM, Gilles Louppe <g.lou...@gmail.com> wrote: >> >> I am rather -1 on making this a transform. There has many ways to come >> up with proximity measures in forest -- In fact, I dont think >> Breiman's is particularly well designed. > > > I think this is actually an argument for non-inclusion in the scikit. > Perhaps an example based on the one in your thesis would suffice. > > What other methods exist? > > M. ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general