> > The narrative docs say that max_features=n_features is a good value for
> > RandomForests.
> > As far as I know, Breiman 2001 suggests max_features =
> > log_2(n_features). I also
> > saw a claim that Breiman 2001 suggests max_features = sqrt(n_features) but I
> > couldn't find that in the paper.
> > I just tried "digits" and max_features = log_2(n_features) works better than
> > max_featurs = n_features. Of course that is definitely no conclusive
> > evidence ;)
> > Is there any reference that says max_features = n_features is good?
> >
> > Also, I think this default value contradicts the beginning of the
> > narrative docs a bit,
> > since that claims "In addition, when splitting a node during the
> > construction of the tree,
> > the split that is chosen is no longer the best split among all features.
> > Instead, the split that is picked is the best split among a random
> > subset of the features."
> > Later, a recommendation on using max_features = n_features is made, but
> > no connection to the explanation above is given.
>
> Short answer: the optimal value of max_features is problem-specific.
>
> In [1], it was found experimentally that max_features=sqrt(n_features)
> was working well for classification problems, and
> max_features=n_features for regression problems. This is a least the
> case for extra-trees. For random forests, I am no longer sure, I will
> check with my advisor.

Back to you.

In the random forest manual [2], it is recommended to use
max_features=sqrt(n_features), with some warnings though:

"mtry0 = the number of variables to split on at each node. Default is
the square root of mdim. ATTENTION! DO NOT USE THE DEFAULT VALUES OF
MTRY0 IF YOU WANT TO OPTIMIZE THE PERFORMANCE OF RANDOM FORESTS. TRY
DIFFERENT VALUES-GROW 20-30 TREES, AND SELECT THE VALUE OF MTRY THAT
GIVES THE SMALLEST OOB ERROR RATE."

[2]: http://oz.berkeley.edu/users/breiman/RandomForests/cc_manual.htm

I don't know why I had in mind that RFs should have
max_features=n_features by default. My bad.

My advisor says that indeed log2 was at first recommended in Breiman's
paper, but sqrt was later prefered by Breiman, as [2] indeed
indicates.

What I suggest is to add a string value max_features="auto" such that
max_features=sqrt(n_features) on classification problems and
max_features=n_features on regression. In the same way, we could add
max_features="sqrt" or max_features="log2" and let the user decides.

@amueller If you like, I can take care of all these changes (in that
case, I'll do it tomorrow).

Gilles

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to