[Scikit-learn-general] random forest importance and correlated variables.

2015-04-19 Thread Luca Puggini
Hi all, I am using random forest and extra trees importance. I am wondering if there is any method to dial with correlated variables. Suppose for example the R party package. In page 30 of the documentation http://cran.r-project.org/web/packages/party/party.pdf a measure of 'conditional

Re: [Scikit-learn-general] random forest importance and correlated variables.

2015-04-19 Thread Luca Puggini
Totally true Josef but I guess that shoesize should not contain more information than age. I was hoping to do not classify it as relevant when age is in the model. @Gilles thanks a lot. I was also reading you thesis. My first impression is that while I think that max features = 1 is a very good

Re: [Scikit-learn-general] random forest importance and correlated variables.

2015-04-19 Thread josef.pktd
On Sun, Apr 19, 2015 at 2:38 PM, Luca Puggini lucapug...@gmail.com wrote: Totally true Josef but I guess that shoesize should not contain more information than age. I was hoping to do not classify it as relevant when age is in the model. Semi-OT for the random forest question I thought about

Re: [Scikit-learn-general] random forest importance and correlated variables.

2015-04-19 Thread josef.pktd
On Sun, Apr 19, 2015 at 10:05 AM, Gilles Louppe g.lou...@gmail.com wrote: Hi Luca, If you want to find all relevant features, I would recommend using ExtraTreesClassifier with max_features=1 and limited depth in order to avoid this kind of bias due to estimation errors. E.g., try with

Re: [Scikit-learn-general] random forest importance and correlated variables.

2015-04-19 Thread Gilles Louppe
Hi Luca, If you want to find all relevant features, I would recommend using ExtraTreesClassifier with max_features=1 and limited depth in order to avoid this kind of bias due to estimation errors. E.g., try with max_depth=3 to 5 or using max_leaf_nodes. Hope this helps, Gilles On 19 April