Hi Luca,

The reason I asked is because I'm interested in the second problem. Thanks a lot for the paper and the suggested params, I'll read it and try them!

Has anyone tested these assumptions/parameters rigorously on simulated data, or is this more of a feeling?

Thanks again for the quick and informative response!
Best,
Daniel

On 27/04/15 20:43, Luca Puggini wrote:
Hey,
I spent quiet some time with this problem.

1) if you are interested only in prediction this is not a big problem. You can preproces the data with PCA

2) if you want to understand which variables are important
I suggest you to read the paper "Understanding variable importances in forests of randomized trees". In general I suggest you to use ExtraTreesClassifier with max_depth=3 or 5. There is a discussion if it is better to use max_features=1 or max_features=n_features (I will go for the latter one).

I went thought some problems with the R package that you are suggesting so I would not use that.

I hope this can help.
Best,
Luca

On Mon, Apr 27, 2015 at 4:48 PM, Daniel Homola <daniel.homol...@imperial.ac.uk <mailto:daniel.homol...@imperial.ac.uk>> wrote:

    Dear all,

    I've found several articles expressing concerns about using Random
    Forest with highly correlated features (e.g.
    http://www.biomedcentral.com/1471-2105/9/307).

    I was wondering if this drawback of the RF algorithm could be somehow
    remedied using scikit-learn methods? The above linked paper has an R
    package but it's known to offer a super-slow solution to the problem.
    When I thought about this problem (quite naively as I'm at a best an
    enthusiastic beginner in ML) I thought maybe further randomisation in
    the tree building might help with this.. So would using
    ExtraTreesClassifier provide some protection against this issue?

    Thanks a lot for any suggestions in advance!

    Cheers,
    Daniel

    
------------------------------------------------------------------------------
    One dashboard for servers and applications across
    Physical-Virtual-Cloud
    Widest out-of-the-box monitoring support with 50+ applications
    Performance metrics, stats and reports that give you Actionable
    Insights
    Deep dive visibility with transaction tracing using APM Insight.
    http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to