I would like to make a related suggestion but instead of focusing on the upper bound for the number of trees rather set choosing the lower bound. From a theoretical perspective, it doesn't make sense to me how fewer trees can result in a better performing random forest model in terms of generalization performance. If you observe a better performance on the same independent test set with fewer trees, I would say that this is likely not a good indicator of better generalization performance. It could be due to overfitting and train/test set resampling and/or picking up artifacts in the dataset.
As a general suggestion, I would suggest choosing a reasonable number of trees that seems computationally feasible given the size of the dataset and the number hyperparameters to compare via model selection. Then, after tuning, I would use the best hyperparameter setting with 10x more trees and see if you notice any significant different in the cross-validation performance. Next, I would use the model and fit it to the whole training set with those best hyperparameters and evaluate the performance on the independent test set. Best, Sebastian > On Dec 24, 2018, at 9:27 PM, Brown J.B. via scikit-learn > <scikit-learn@python.org> wrote: > > Take random forest as example, if I give estimator from 10 to 10000(10, 100, > 1000, 10000) into grid search. > Based on the result, I found estimator=100 is the best, but I don't know > lower or greater than 100 is better. > How should I decide? brute force or any tools better than GridSearchCV? > > A simple but nonetheless practical solution is to > (1) start with an upper bound on the number of trees you are willing to > accept in the model, > (2) obtain its performance (ACC, MCC, F1, etc) as the starting reference > point, > (3) systematically lower the number of trees (log2 scale down, fixed size > decrement, etc) > (4) obtain the reduced forest size performance, > (5) Repeat (3)-(4) until [performance(reference) - performance(current > forest size)] > tolerance > > You can encapsulate that in a function which then returns the final model you > obtain. > From the model object, the number of trees can be obtained. > > J.B. > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn