> Take random forest as example, if I give estimator from 10 to 10000(10,
> 100, 1000, 10000) into grid search.
> Based on the result, I found estimator=100 is the best, but I don't know
> lower or greater than 100 is better.
> How should I decide? brute force or any tools better than GridSearchCV?
>

A simple but nonetheless practical solution is to
  (1) start with an upper bound on the number of trees you are willing to
accept in the model,
  (2) obtain its performance (ACC, MCC, F1, etc) as the starting reference
point,
  (3) systematically lower the number of trees (log2 scale down, fixed size
decrement, etc)
  (4) obtain the reduced forest size performance,
  (5) Repeat (3)-(4) until [performance(reference) - performance(current
forest size)] > tolerance

You can encapsulate that in a function which then returns the final model
you obtain.
>From the model object, the number of trees can be obtained.

J.B.
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to