Re: [Scikit-learn-general] Random forest low score on testing data

Luca Puggini Fri, 05 Feb 2016 09:01:07 -0800

Here there are the extra trees
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html#sklearn.ensemble.ExtraTreesRegressor


it work similarly to random forest.  In my experience RF tends often to
overfit.
I suggest you to start using the default parameters and cross validate only
on the max_depth parameter.  Start with small values of max_depth [2, 3, 5,
7, 10] and check how the performances of the model change.

Good Luck.
Luca

On Fri, Feb 5, 2016 at 4:28 PM muhammad waseem <[email protected]>
wrote:

> Hi Luca,
> Could you please explain how can do this randomized trees in scikit-learn?
> So you suggest I should be using Random forest?
>
>
> On Fri, Feb 5, 2016 at 4:13 PM, Luca Puggini <[email protected]> wrote:
>
>> To me the score is not so low. The model is slightly over fitting. Try to
>> repeat the same process with extremely randomized trees instead of random
>> forest and try to keep a low depth.
>> On Fri 5 Feb 2016 at 16:01 muhammad waseem <[email protected]>
>> wrote:
>>
>>> Dear All,
>>> I am trying to train my model using Scikit-learn's Random forest
>>> (Regression) and have tried to use GridSearch with Cross-validation (CV=5)
>>> to tune hyperparameters. I fixed n_estimators =2000 for all cases. Below
>>> are the few searches that I performed.
>>>
>>> 1) max_features :[1,3,5], max_depth :[1,5,10,15],
>>> min_samples_split:[2,6,8,10], bootstrap:[True, False]
>>> The best were max_features=5, max_depth = 15, min_samples_split:10,
>>> bootstrap=True
>>> Best score = 0.8724
>>>
>>> Then I searched close to the parameters that were best;
>>> 2) max_features :[3,5,6], max_depth :[10,20,30,40],
>>> min_samples_split:[8,16,20,24], bootstrap:[True, False]
>>> The best were max_features=5, max_depth = 30, min_samples_split:20,
>>> bootstrap=True
>>> Best score = 0.8722
>>>
>>> Again, I searched close to the parameters that were best;
>>> 3) max_features :[2,4,6], max_depth :[25,35,40,50],
>>> min_samples_split:[22,28,34,40], bootstrap:[True, False]
>>>
>>> The best were max_features=4, max_depth = 25, min_samples_split:22,
>>> bootstrap=True
>>> Best score = 0.8725
>>>
>>> Then I used GridSearch among the best parameters found in the above runs
>>> and found the best on as max_features=4, max_depth = 15,
>>> min_samples_split:10,
>>> Best score = 0.8729
>>>
>>> Then I used these parameters to predict for an unknown dataset but got a
>>> very low score (around 0.72).
>>>
>>> My questions are; Am I doing the hyperparameter tuning correctly or I am
>>> missing something?
>>>
>>> 2) Why is my testing score very low as compared to my training and
>>> validation score and how can I improve it so that I get good predictions
>>> out of my model?
>>>
>>> Sorry, if these are basic questions as I am new to scikit-learn and ML.
>>>
>>> Thanks!
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>> Monitor end-to-end web transactions and take corrective actions now
>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>> --
>>
>> Sent by mobile phone
>>
>>
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
-- 

Sent by mobile phone

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Random forest low score on testing data

Reply via email to