> > The narrative docs say that max_features=n_features is a good value for > > RandomForests. > > As far as I know, Breiman 2001 suggests max_features = > > log_2(n_features). I also > > saw a claim that Breiman 2001 suggests max_features = sqrt(n_features) but I > > couldn't find that in the paper. > > I just tried "digits" and max_features = log_2(n_features) works better than > > max_featurs = n_features. Of course that is definitely no conclusive > > evidence ;) > > Is there any reference that says max_features = n_features is good? > > > > Also, I think this default value contradicts the beginning of the > > narrative docs a bit, > > since that claims "In addition, when splitting a node during the > > construction of the tree, > > the split that is chosen is no longer the best split among all features. > > Instead, the split that is picked is the best split among a random > > subset of the features." > > Later, a recommendation on using max_features = n_features is made, but > > no connection to the explanation above is given. > > Short answer: the optimal value of max_features is problem-specific. > > In [1], it was found experimentally that max_features=sqrt(n_features) > was working well for classification problems, and > max_features=n_features for regression problems. This is a least the > case for extra-trees. For random forests, I am no longer sure, I will > check with my advisor.
Back to you. In the random forest manual [2], it is recommended to use max_features=sqrt(n_features), with some warnings though: "mtry0 = the number of variables to split on at each node. Default is the square root of mdim. ATTENTION! DO NOT USE THE DEFAULT VALUES OF MTRY0 IF YOU WANT TO OPTIMIZE THE PERFORMANCE OF RANDOM FORESTS. TRY DIFFERENT VALUES-GROW 20-30 TREES, AND SELECT THE VALUE OF MTRY THAT GIVES THE SMALLEST OOB ERROR RATE." [2]: http://oz.berkeley.edu/users/breiman/RandomForests/cc_manual.htm I don't know why I had in mind that RFs should have max_features=n_features by default. My bad. My advisor says that indeed log2 was at first recommended in Breiman's paper, but sqrt was later prefered by Breiman, as [2] indeed indicates. What I suggest is to add a string value max_features="auto" such that max_features=sqrt(n_features) on classification problems and max_features=n_features on regression. In the same way, we could add max_features="sqrt" or max_features="log2" and let the user decides. @amueller If you like, I can take care of all these changes (in that case, I'll do it tomorrow). Gilles ------------------------------------------------------------------------------ Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastructure or vast IT resources to deliver seamless, secure access to virtual desktops. With this all-in-one solution, easily deploy virtual desktops for less than the cost of PCs and save 60% on VDI infrastructure costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
