Hi Luca,
The reason I asked is because I'm interested in the second problem.
Thanks a lot for the paper and the suggested params, I'll read it and
try them!
Has anyone tested these assumptions/parameters rigorously on simulated
data, or is this more of a feeling?
Thanks again for the quick and informative response!
Best,
Daniel
On 27/04/15 20:43, Luca Puggini wrote:
Hey,
I spent quiet some time with this problem.
1) if you are interested only in prediction this is not a big problem.
You can preproces the data with PCA
2) if you want to understand which variables are important
I suggest you to read the paper "Understanding variable importances in
forests of randomized trees".
In general I suggest you to use ExtraTreesClassifier with max_depth=3
or 5. There is a discussion if it is better to use max_features=1 or
max_features=n_features (I will go for the latter one).
I went thought some problems with the R package that you are
suggesting so I would not use that.
I hope this can help.
Best,
Luca
On Mon, Apr 27, 2015 at 4:48 PM, Daniel Homola
<daniel.homol...@imperial.ac.uk
<mailto:daniel.homol...@imperial.ac.uk>> wrote:
Dear all,
I've found several articles expressing concerns about using Random
Forest with highly correlated features (e.g.
http://www.biomedcentral.com/1471-2105/9/307).
I was wondering if this drawback of the RF algorithm could be somehow
remedied using scikit-learn methods? The above linked paper has an R
package but it's known to offer a super-slow solution to the problem.
When I thought about this problem (quite naively as I'm at a best an
enthusiastic beginner in ML) I thought maybe further randomisation in
the tree building might help with this.. So would using
ExtraTreesClassifier provide some protection against this issue?
Thanks a lot for any suggestions in advance!
Cheers,
Daniel
------------------------------------------------------------------------------
One dashboard for servers and applications across
Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable
Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general