Re: [Scikit-learn-general] Hyperparameter tuning for Random Forest and Gradient boosting trees

muhammad waseem Fri, 29 Jan 2016 13:59:32 -0800

On Fri, Jan 29, 2016 at 9:51 PM, Sebastian Raschka <[email protected]>
wrote:


>
> On Jan 29, 2016, at 4:45 PM, muhammad waseem <[email protected]>
> wrote:
>
> I meant, how I make sure that I don't miss the "Good" combination that you
> mentioned?
>
>
> Here, we are back to an exhaustive search on an infinitely small grid :).
> It's really about finding the "sweet spot" that is "practical" given your
> problem and available resources.
>
>
>
> Also, for second point: Maybe considering computational time and then
> making sure that I have enough number of estimators in the parametric
> study?
>
>
> What do you mean by parametric study, exactly? Do you mean that you are
> doing the hyperparam search for an empirical comparison study or do you
> just want to get a good model?
>
> Well, both could be addressed, no? But first focus is to get the model
right with right selected parameters.

>
>
>
> On Fri, Jan 29, 2016 at 9:38 PM, muhammad waseem <[email protected]
> > wrote:
>
>>
>> Thanks for your reply. So this mean I should start with e.g. "max_depth":
>>> [1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and
>>> min_samples_leaf=10, then I should explore values close to these values. Am
>>> I right?
>>>
>>>
>>> Yes, this would work. However, keep in mind that you may be missing a
>>> "good" combination this way. And if you have a large number  of
>>> n_estimators, tuning a random forest can be "relatively" expensive. Plus,
>>> you'd typically don't want or need to prune the trees here, that's
>>> basically the whole idea behind RF.
>>>
>>
>> So I make sure that I don't miss the "Good" combination?
>>
>>>
>>> Shall I use small value of number of estimator, while conducting this
>>> parametric study.After that I can use a higher value while fitting my model?
>>>
>>>
>>> Also here, the parameters that you tuned may only be good for the model
>>> based on the specific number of estimators. In general, I would maybe
>>> advice against tuning the hyperparameters at all and use the computational
>>> time to increase the number of n_estimators.
>>>
>>
>> Maybe considering computational time and then making sure that I have
>> enough number of estimators in the parametric study?
>>
>>>
>>> On Jan 29, 2016, at 4:18 PM, muhammad waseem <[email protected]>
>>> wrote:
>>>
>>> Hi Sebastian,
>>> Thanks for your reply. So this mean I should start with e.g.
>>> "max_depth": [1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the
>>> max_depth=10 and min_samples_leaf=10, then I should explore values close to
>>> these values. Am I right?
>>>
>>> Shall I use small value of number of estimator, while conducting this
>>> parametric study.After that I can use a higher value while fitting my
>>> model? Will this change other parameters, meaning is n_estimator depends on
>>> other parameters?
>>>
>>> Also, should I use early stopping while doing GridSearchCV?
>>>
>>> Thanks again.
>>> Regards
>>> Waseem
>>>
>>> On Fri, Jan 29, 2016 at 6:57 PM, Sebastian Raschka <[email protected]
>>> > wrote:
>>>
>>>> Hi, Waseem,
>>>> with a fine-enough grid, the GridSearchCV would be more "thorough" than
>>>> the randomized search. However, the problem is essentially some sort of
>>>> combinatorial explosion. Typically, I start with a "rougher" grid (the
>>>> different parameters are more "spaced out" relative to each other). After
>>>> that, I use a "finer" grid around the parameters that came up in the
>>>> previous search.
>>>> However, it all comes down to computational time vs. being thorough. Or
>>>> in other words, grid search is an exhaustive search whereas randomized
>>>> search is a computationally "more efficient" approach.
>>>>
>>>>
>>>> > On Jan 29, 2016, at 11:45 AM, muhammad waseem <
>>>> [email protected]> wrote:
>>>> >
>>>> > Hello All,
>>>> > I am new to scikitlearn and ML, and trying to train my model using
>>>> random forest and gradient boosting trees regressors. I was wondering what
>>>> is the best way to do hyperparameter tuning, shall I use GridSearchCV or
>>>> RandomisedSearchCV? I have read that the performance of RandomiseSeacrhCV
>>>> is almost same as GridSearchCV (most of the times). If I go with
>>>> RandomisedSearchCV then what should be the range of values for different
>>>> parameters? How will I know that the range I am selecting is the correct
>>>> one?
>>>> >
>>>> > Also, what about the number of estimators? In the GridSearchCV or
>>>> RandomisedSearchCV, shall I start with a low value and then after selecting
>>>> other parameters, I will choose a large number of estimators for fitting
>>>> purposes. Am I right?
>>>> >
>>>> > Shall I always use early stopping, no matter if I use Grid search or
>>>> Randomised Search?
>>>> >
>>>> > P.S: Training data: Number of Inputs = 6
>>>> >                             Number fo Outputs = 1
>>>> >                             Number of samples (rows) = 8526
>>>> >          testing data: Number of samples (rows) = 1416
>>>> >
>>>> > Thanks
>>>> > Kindest Regards
>>>> > Waseem
>>>> >
>>>> ------------------------------------------------------------------------------
>>>> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>> > Monitor end-to-end web transactions and take corrective actions now
>>>> > Troubleshoot faster and improve end-user experience. Signup Now!
>>>> >
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
>>>> > Scikit-learn-general mailing list
>>>> > [email protected]
>>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>> Monitor end-to-end web transactions and take corrective actions now
>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>> Monitor end-to-end web transactions and take corrective actions now
>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>> Monitor end-to-end web transactions and take corrective actions now
>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>> ------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
>
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Hyperparameter tuning for Random Forest and Gradient boosting trees

Reply via email to