Dear all,

I've found several articles expressing concerns about using Random 
Forest with highly correlated features (e.g. 
http://www.biomedcentral.com/1471-2105/9/307).

I was wondering if this drawback of the RF algorithm could be somehow 
remedied using scikit-learn methods? The above linked paper has an R 
package but it's known to offer a super-slow solution to the problem.
When I thought about this problem (quite naively as I'm at a best an 
enthusiastic beginner in ML) I thought maybe further randomisation in 
the tree building might help with this.. So would using 
ExtraTreesClassifier provide some protection against this issue?

Thanks a lot for any suggestions in advance!

Cheers,
Daniel

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to