> Take random forest as example, if I give estimator from 10 to 10000(10, > 100, 1000, 10000) into grid search. > Based on the result, I found estimator=100 is the best, but I don't know > lower or greater than 100 is better. > How should I decide? brute force or any tools better than GridSearchCV? >
A simple but nonetheless practical solution is to (1) start with an upper bound on the number of trees you are willing to accept in the model, (2) obtain its performance (ACC, MCC, F1, etc) as the starting reference point, (3) systematically lower the number of trees (log2 scale down, fixed size decrement, etc) (4) obtain the reduced forest size performance, (5) Repeat (3)-(4) until [performance(reference) - performance(current forest size)] > tolerance You can encapsulate that in a function which then returns the final model you obtain. >From the model object, the number of trees can be obtained. J.B.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn