Hi Daniel Sorry for the late reply. I think that Conditional trees would be a good addition. However, coincidentally, the tree module is currently undergoing another rewrite: https://github.com/scikit-learn/scikit-learn/pull/5041
As for feature selection, I have difficulty giving advice given that I don't know the problem at all. I do know that both RF and L1 norm based feature selection methods both do not do well with correlated features. Conditional independence methods like graphical LASSO and its friends (group graphical LASSO, fused graphical LASSO..) can help with this issue. I do not know of anyone who has conditional dependence trees in their repo. Given the lack of other replies, it may not be a thing which exists, unfortunately. Take a look at my rewrite of _tree.pyx, and see if you can determine how difficult it would be to extend to conditional trees. I can answer questions if you have any! Jacob On Fri, Jul 31, 2015 at 6:00 AM, Daniel Homola < daniel.homol...@imperial.ac.uk> wrote: > Hi all, > > I was checking the archive of the mailing list to see if there were any > attempts in the past to incorporate Conditional Inferences Trees into > the Ensemble module. I've found a mail from Theo Strinopoulos > (07-07-2013) asking if this would be welcomed as a contribution of his. > Gilles Louppe replied that it would be very much so but the Tree module > is under rewrite and Theo should wait a bit more. > > Does anyone know what happened with this initiative? I've been working > on RF based feature selection methods in the past few months, and > realized that what several people have pointed out earlier might be true > :) Namely that the information based decision criteria like Gini and > Entropy favor variables with larger cardinality, plus that RF isn't > terribly good at dealing with correlated predictors. > > This is what they found here: http://www.biomedcentral.com/1471-2105/8/25 > and I think this is what Gilles thesis concludes as well. (please > correct me if I've misunderstood your work): > http://www.montefiore.ulg.ac.be/~glouppe/pdf/phd-thesis.pdf > > Gilles proposed that limiting the max_depth of the tree might be of > help, however neither this nor using ExtraTrees helped (made a > substantial difference) in my experiments. > > The paper above shows with simulation studies that using Conditional > Inference Trees as base learners in the ensemble might ameliorate these > issues, if it's coupled with subsampling without replacement instead of > the traditional bootstrapping. > > So I was wondering if any of these two things are available in some > bleeding-edge form, or someone's private branch maybe? > > When I naively checked the Ensemble and Tree code on github, hoping I > could contribute and implement these, I must admit, I shied away from it > quite quickly due to my lack of C and Cython knowledge.. > > Thanks for any help in advance! > > Cheers, > Daniel > > ps.: I know R has party which has ctrees, but it's non-parallel and > really slow, so it would amazing if scikit would have this, I think.. > > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general