Dear all, I've found several articles expressing concerns about using Random Forest with highly correlated features (e.g. http://www.biomedcentral.com/1471-2105/9/307).
I was wondering if this drawback of the RF algorithm could be somehow remedied using scikit-learn methods? The above linked paper has an R package but it's known to offer a super-slow solution to the problem. When I thought about this problem (quite naively as I'm at a best an enthusiastic beginner in ML) I thought maybe further randomisation in the tree building might help with this.. So would using ExtraTreesClassifier provide some protection against this issue? Thanks a lot for any suggestions in advance! Cheers, Daniel ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general