Le 25 mars 2012 12:16, Gilles Louppe <[email protected]> a écrit : > Hi Olivier, > > The higher the number of estimators, the better. The more random the > trees (e.g., the lower max_features), the more important it usually is > to have a large forest to decrease the variance. To me, 10 is actually > a very low default value. In my daily research, I deal with hundreds > of trees. But yeah, it also takes longer.
Indeed. I think we should put some practical scales somewhere in the doc, maybe something along the lines: Depending on the max_depth of the trees and the size of the dataset: - 10+ trees: a couple of seconds or minutes of sequential CPU time, suitable for debugging - 500+ trees: a couple of minutes or hours of sequential CPU time, suitable for getting interesting results (requires multi-core computation in practice) - 5000+ trees: a couple of hours or days of sequential CPU time, suitable for getting appearing in the leaderboards of machine learning challenges (requires distributed computation in practice) > By the way I am curious, what kind of dataset are you testing those > methods on? :) I used http://scikit-learn.org/dev/datasets/index.html#the-olivetti-faces-dataset which is stupid because of the large number of classes as explained by Peter. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
