Re: [Scikit-learn-general] Random forest low score on testing data

Luca Puggini Fri, 05 Feb 2016 08:15:06 -0800

To me the score is not so low. The model is slightly over fitting. Try to
repeat the same process with extremely randomized trees instead of random
forest and try to keep a low depth.
On Fri 5 Feb 2016 at 16:01 muhammad waseem <[email protected]> wrote:


> Dear All,
> I am trying to train my model using Scikit-learn's Random forest
> (Regression) and have tried to use GridSearch with Cross-validation (CV=5)
> to tune hyperparameters. I fixed n_estimators =2000 for all cases. Below
> are the few searches that I performed.
>
> 1) max_features :[1,3,5], max_depth :[1,5,10,15],
> min_samples_split:[2,6,8,10], bootstrap:[True, False]
> The best were max_features=5, max_depth = 15, min_samples_split:10,
> bootstrap=True
> Best score = 0.8724
>
> Then I searched close to the parameters that were best;
> 2) max_features :[3,5,6], max_depth :[10,20,30,40],
> min_samples_split:[8,16,20,24], bootstrap:[True, False]
> The best were max_features=5, max_depth = 30, min_samples_split:20,
> bootstrap=True
> Best score = 0.8722
>
> Again, I searched close to the parameters that were best;
> 3) max_features :[2,4,6], max_depth :[25,35,40,50],
> min_samples_split:[22,28,34,40], bootstrap:[True, False]
>
> The best were max_features=4, max_depth = 25, min_samples_split:22,
> bootstrap=True
> Best score = 0.8725
>
> Then I used GridSearch among the best parameters found in the above runs
> and found the best on as max_features=4, max_depth = 15,
> min_samples_split:10,
> Best score = 0.8729
>
> Then I used these parameters to predict for an unknown dataset but got a
> very low score (around 0.72).
>
> My questions are; Am I doing the hyperparameter tuning correctly or I am
> missing something?
>
> 2) Why is my testing score very low as compared to my training and
> validation score and how can I improve it so that I get good predictions
> out of my model?
>
> Sorry, if these are basic questions as I am new to scikit-learn and ML.
>
> Thanks!
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
-- 

Sent by mobile phone

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Random forest low score on testing data

Reply via email to