Hey,
I spent quiet some time with this problem.
1) if you are interested only in prediction this is not a big problem. You
can preproces the data with PCA
2) if you want to understand which variables are important
I suggest you to read the paper "Understanding variable importances in
forests of randomized trees".
In general I suggest you to use ExtraTreesClassifier with max_depth=3 or 5.
There is a discussion if it is better to use max_features=1 or
max_features=n_features (I will go for the latter one).
I went thought some problems with the R package that you are suggesting so
I would not use that.
I hope this can help.
Best,
Luca
On Mon, Apr 27, 2015 at 4:48 PM, Daniel Homola <
daniel.homol...@imperial.ac.uk> wrote:
> Dear all,
>
> I've found several articles expressing concerns about using Random
> Forest with highly correlated features (e.g.
> http://www.biomedcentral.com/1471-2105/9/307).
>
> I was wondering if this drawback of the RF algorithm could be somehow
> remedied using scikit-learn methods? The above linked paper has an R
> package but it's known to offer a super-slow solution to the problem.
> When I thought about this problem (quite naively as I'm at a best an
> enthusiastic beginner in ML) I thought maybe further randomisation in
> the tree building might help with this.. So would using
> ExtraTreesClassifier provide some protection against this issue?
>
> Thanks a lot for any suggestions in advance!
>
> Cheers,
> Daniel
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general