Re: [scikit-learn] Any way to tune the parameters better than GridSearchCV?

2018-12-24 Thread Sebastian Raschka
I would like to make a related suggestion but instead of focusing on the upper 
bound for the number of trees rather set choosing the lower bound. From a 
theoretical perspective, it doesn't make sense to me how fewer trees can result 
in a better performing random forest model in terms of generalization 
performance. If you observe a better performance on the same independent test 
set with fewer trees, I would say that this is likely not a good indicator of 
better generalization performance. It could be due to overfitting and 
train/test set resampling and/or picking up artifacts in the dataset. 

As a general suggestion, I would suggest choosing a reasonable number of trees 
that seems computationally feasible given the size of the dataset and the 
number hyperparameters to compare via model selection. Then, after tuning, I 
would use the best hyperparameter setting with 10x more trees and see if you 
notice any significant different in the cross-validation performance. Next, I 
would use the model and fit it to the whole training set with those best 
hyperparameters and evaluate the performance on the independent test set.

Best,
Sebastian


> On Dec 24, 2018, at 9:27 PM, Brown J.B. via scikit-learn 
>  wrote:
> 
> Take random forest as example, if I give estimator from 10 to 1(10, 100, 
> 1000, 1) into grid search.
> Based on the result, I found estimator=100 is the best, but I don't know 
> lower or greater than 100 is better.
> How should I decide? brute force or any tools better than GridSearchCV?
> 
> A simple but nonetheless practical solution is to 
>   (1) start with an upper bound on the number of trees you are willing to 
> accept in the model, 
>   (2) obtain its performance (ACC, MCC, F1, etc) as the starting reference 
> point,
>   (3) systematically lower the number of trees (log2 scale down, fixed size 
> decrement, etc)
>   (4) obtain the reduced forest size performance,
>   (5) Repeat (3)-(4) until [performance(reference) - performance(current 
> forest size)] > tolerance
> 
> You can encapsulate that in a function which then returns the final model you 
> obtain.
> From the model object, the number of trees can be obtained.
> 
> J.B.
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Any way to tune the parameters better than GridSearchCV?

2018-12-24 Thread Brown J.B. via scikit-learn
> Take random forest as example, if I give estimator from 10 to 1(10,
> 100, 1000, 1) into grid search.
> Based on the result, I found estimator=100 is the best, but I don't know
> lower or greater than 100 is better.
> How should I decide? brute force or any tools better than GridSearchCV?
>

A simple but nonetheless practical solution is to
  (1) start with an upper bound on the number of trees you are willing to
accept in the model,
  (2) obtain its performance (ACC, MCC, F1, etc) as the starting reference
point,
  (3) systematically lower the number of trees (log2 scale down, fixed size
decrement, etc)
  (4) obtain the reduced forest size performance,
  (5) Repeat (3)-(4) until [performance(reference) - performance(current
forest size)] > tolerance

You can encapsulate that in a function which then returns the final model
you obtain.
>From the model object, the number of trees can be obtained.

J.B.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Any way to tune the parameters better than GridSearchCV?

2018-12-24 Thread lampahome
Take random forest as example, if I give estimator from 10 to 1(10,
100, 1000, 1) into grid search.

Based on the result, I found estimator=100 is the best, but I don't know
lower or greater than 100 is better.

How should I decide? brute force or any tools better than GridSearchCV?

thx
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn