Hi Daniel

Sorry for the late reply. I think that Conditional trees would be a good
addition. However, coincidentally, the tree module is currently undergoing
another rewrite: https://github.com/scikit-learn/scikit-learn/pull/5041

As for feature selection, I have difficulty giving advice given that I
don't know the problem at all. I do know that both RF and L1 norm based
feature selection methods both do not do well with correlated features.
Conditional independence methods like graphical LASSO and its friends
(group graphical LASSO, fused graphical LASSO..) can help with this issue.

I do not know of anyone who has conditional dependence trees in their repo.
Given the lack of other replies, it may not be a thing which exists,
unfortunately.

Take a look at my rewrite of _tree.pyx, and see if you can determine how
difficult it would be to extend to conditional trees. I can answer
questions if you have any!

Jacob

On Fri, Jul 31, 2015 at 6:00 AM, Daniel Homola <
daniel.homol...@imperial.ac.uk> wrote:

> Hi all,
>
> I was checking the archive of the mailing list to see if there were any
> attempts in the past to incorporate Conditional Inferences Trees into
> the Ensemble module. I've found a mail from Theo Strinopoulos
> (07-07-2013) asking if this would be welcomed  as a contribution of his.
> Gilles Louppe replied that it would be very much so but the Tree module
> is under rewrite and Theo should wait a bit more.
>
> Does anyone know what happened with this initiative? I've been working
> on RF based feature selection methods in the past few months, and
> realized that what several people have pointed out earlier might be true
> :) Namely that the information based decision criteria like Gini and
> Entropy favor variables with larger cardinality, plus that RF isn't
> terribly good at dealing with correlated predictors.
>
> This is what they found here: http://www.biomedcentral.com/1471-2105/8/25
> and I think this is what Gilles thesis concludes as well. (please
> correct me if I've misunderstood your work):
> http://www.montefiore.ulg.ac.be/~glouppe/pdf/phd-thesis.pdf
>
> Gilles proposed that limiting the max_depth of the tree might be of
> help, however neither this nor using ExtraTrees helped (made a
> substantial difference) in my experiments.
>
> The paper above shows with simulation studies that using Conditional
> Inference Trees as base learners in the ensemble might ameliorate these
> issues, if it's coupled with subsampling without replacement instead of
> the traditional bootstrapping.
>
> So I was wondering if any of these two things are available in some
> bleeding-edge form, or someone's private branch maybe?
>
> When I naively checked the Ensemble and Tree code on github, hoping I
> could contribute and implement these, I must admit, I shied away from it
> quite quickly due to my lack of C and Cython knowledge..
>
> Thanks for any help in advance!
>
> Cheers,
> Daniel
>
> ps.: I know R has party which has ctrees, but it's non-parallel and
> really slow, so it would  amazing if scikit would have this, I think..
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to