Re: [Scikit-learn-general] Hyperparameter tuning for Random Forest and Gradient boosting trees

Sebastian Raschka Fri, 29 Jan 2016 13:52:50 -0800

> On Jan 29, 2016, at 4:45 PM, muhammad waseem <[email protected]> wrote:
> 
> I meant, how I make sure that I don't miss the "Good" combination that you 
> mentioned?


Here, we are back to an exhaustive search on an infinitely small grid :). It's 
really about finding the "sweet spot" that is "practical" given your problem 
and available resources.


> 
> Also, for second point: Maybe considering computational time and then making 
> sure that I have enough number of estimators in the parametric study? 

What do you mean by parametric study, exactly? Do you mean that you are doing 
the hyperparam search for an empirical comparison study or do you just want to 
get a good model?



> 
> On Fri, Jan 29, 2016 at 9:38 PM, muhammad waseem <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> Thanks for your reply. So this mean I should start with e.g. "max_depth": 
>> [1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and 
>> min_samples_leaf=10, then I should explore values close to these values. Am 
>> I right?
> 
> 
> Yes, this would work. However, keep in mind that you may be missing a "good" 
> combination this way. And if you have a large number  of n_estimators, tuning 
> a random forest can be "relatively" expensive. Plus, you'd typically don't 
> want or need to prune the trees here, that's basically the whole idea behind 
> RF.
> 
> So I make sure that I don't miss the "Good" combination? 
> 
>> Shall I use small value of number of estimator, while conducting this 
>> parametric study.After that I can use a higher value while fitting my model?
> 
> Also here, the parameters that you tuned may only be good for the model based 
> on the specific number of estimators. In general, I would maybe advice 
> against tuning the hyperparameters at all and use the computational time to 
> increase the number of n_estimators.
> 
> Maybe considering computational time and then making sure that I have enough 
> number of estimators in the parametric study? 
> 
>> On Jan 29, 2016, at 4:18 PM, muhammad waseem <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi Sebastian,
>> Thanks for your reply. So this mean I should start with e.g. "max_depth": 
>> [1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and 
>> min_samples_leaf=10, then I should explore values close to these values. Am 
>> I right?
>> 
>> Shall I use small value of number of estimator, while conducting this 
>> parametric study.After that I can use a higher value while fitting my model? 
>> Will this change other parameters, meaning is n_estimator depends on other 
>> parameters? 
>> 
>> Also, should I use early stopping while doing GridSearchCV?
>> 
>> Thanks again.
>> Regards
>> Waseem
>> 
>> On Fri, Jan 29, 2016 at 6:57 PM, Sebastian Raschka <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi, Waseem,
>> with a fine-enough grid, the GridSearchCV would be more "thorough" than the 
>> randomized search. However, the problem is essentially some sort of 
>> combinatorial explosion. Typically, I start with a "rougher" grid (the 
>> different parameters are more "spaced out" relative to each other). After 
>> that, I use a "finer" grid around the parameters that came up in the 
>> previous search.
>> However, it all comes down to computational time vs. being thorough. Or in 
>> other words, grid search is an exhaustive search whereas randomized search 
>> is a computationally "more efficient" approach.
>> 
>> 
>> > On Jan 29, 2016, at 11:45 AM, muhammad waseem <[email protected] 
>> > <mailto:[email protected]>> wrote:
>> >
>> > Hello All,
>> > I am new to scikitlearn and ML, and trying to train my model using random 
>> > forest and gradient boosting trees regressors. I was wondering what is the 
>> > best way to do hyperparameter tuning, shall I use GridSearchCV or 
>> > RandomisedSearchCV? I have read that the performance of RandomiseSeacrhCV 
>> > is almost same as GridSearchCV (most of the times). If I go with 
>> > RandomisedSearchCV then what should be the range of values for different 
>> > parameters? How will I know that the range I am selecting is the correct 
>> > one?
>> >
>> > Also, what about the number of estimators? In the GridSearchCV or 
>> > RandomisedSearchCV, shall I start with a low value and then after 
>> > selecting other parameters, I will choose a large number of estimators for 
>> > fitting purposes. Am I right?
>> >
>> > Shall I always use early stopping, no matter if I use Grid search or 
>> > Randomised Search?
>> >
>> > P.S: Training data: Number of Inputs = 6
>> >                             Number fo Outputs = 1
>> >                             Number of samples (rows) = 8526
>> >          testing data: Number of samples (rows) = 1416
>> >
>> > Thanks
>> > Kindest Regards
>> > Waseem
>> > ------------------------------------------------------------------------------
>> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> > Monitor end-to-end web transactions and take corrective actions now
>> > Troubleshoot faster and improve end-user experience. Signup Now!
>> > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
>> >  
>> > <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________>
>> > Scikit-learn-general mailing list
>> > [email protected] 
>> > <mailto:[email protected]>
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>> > <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>> 
>> 
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 
>> <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected] 
>> <mailto:[email protected]>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>> 
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
>>  
>> <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________>
>> Scikit-learn-general mailing list
>> [email protected] 
>> <mailto:[email protected]>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 
> <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140>
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected] 
> <mailto:[email protected]>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> 
> 
> -- 
> Dr Muhammad Waseem Ahmad
> Research Associate,
> BRE Center for Sustainable Construction,
> School of Engineering,
> Cardiff University,
> Cardiff, UK.
> 
> 
> 
> -- 
> Dr Muhammad Waseem Ahmad
> Research Associate,
> BRE Center for Sustainable Construction,
> School of Engineering,
> Cardiff University,
> Cardiff, UK.
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Hyperparameter tuning for Random Forest and Gradient boosting trees

Reply via email to