Thanks Luca, I will give it a try. When you say extremely randomised, does
this mean using large number of n_estimators?

Also, any idea how to solve overfitting problem for random forest?

Regards
Waseem

On Fri, Feb 5, 2016 at 5:00 PM, Luca Puggini <lucapug...@gmail.com> wrote:

> Here there are the extra trees
> http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html#sklearn.ensemble.ExtraTreesRegressor
>
> it work similarly to random forest.  In my experience RF tends often to
> overfit.
> I suggest you to start using the default parameters and cross validate
> only on the max_depth parameter.  Start with small values of max_depth [2,
> 3, 5, 7, 10] and check how the performances of the model change.
>
> Good Luck.
> Luca
>
> On Fri, Feb 5, 2016 at 4:28 PM muhammad waseem <m.waseem.ah...@gmail.com>
> wrote:
>
>> Hi Luca,
>> Could you please explain how can do this randomized trees in
>> scikit-learn? So you suggest I should be using Random forest?
>>
>>
>> On Fri, Feb 5, 2016 at 4:13 PM, Luca Puggini <lucapug...@gmail.com>
>> wrote:
>>
>>> To me the score is not so low. The model is slightly over fitting. Try
>>> to repeat the same process with extremely randomized trees instead of
>>> random forest and try to keep a low depth.
>>> On Fri 5 Feb 2016 at 16:01 muhammad waseem <m.waseem.ah...@gmail.com>
>>> wrote:
>>>
>>>> Dear All,
>>>> I am trying to train my model using Scikit-learn's Random forest
>>>> (Regression) and have tried to use GridSearch with Cross-validation (CV=5)
>>>> to tune hyperparameters. I fixed n_estimators =2000 for all cases. Below
>>>> are the few searches that I performed.
>>>>
>>>> 1) max_features :[1,3,5], max_depth :[1,5,10,15],
>>>> min_samples_split:[2,6,8,10], bootstrap:[True, False]
>>>> The best were max_features=5, max_depth = 15, min_samples_split:10,
>>>> bootstrap=True
>>>> Best score = 0.8724
>>>>
>>>> Then I searched close to the parameters that were best;
>>>> 2) max_features :[3,5,6], max_depth :[10,20,30,40],
>>>> min_samples_split:[8,16,20,24], bootstrap:[True, False]
>>>> The best were max_features=5, max_depth = 30, min_samples_split:20,
>>>> bootstrap=True
>>>> Best score = 0.8722
>>>>
>>>> Again, I searched close to the parameters that were best;
>>>> 3) max_features :[2,4,6], max_depth :[25,35,40,50],
>>>> min_samples_split:[22,28,34,40], bootstrap:[True, False]
>>>>
>>>> The best were max_features=4, max_depth = 25, min_samples_split:22,
>>>> bootstrap=True
>>>> Best score = 0.8725
>>>>
>>>> Then I used GridSearch among the best parameters found in the above
>>>> runs and found the best on as max_features=4, max_depth = 15,
>>>> min_samples_split:10,
>>>> Best score = 0.8729
>>>>
>>>> Then I used these parameters to predict for an unknown dataset but got
>>>> a very low score (around 0.72).
>>>>
>>>> My questions are; Am I doing the hyperparameter tuning correctly or I
>>>> am missing something?
>>>>
>>>> 2) Why is my testing score very low as compared to my training and
>>>> validation score and how can I improve it so that I get good predictions
>>>> out of my model?
>>>>
>>>> Sorry, if these are basic questions as I am new to scikit-learn and ML.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>> Monitor end-to-end web transactions and take corrective actions now
>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>> --
>>>
>>> Sent by mobile phone
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>> Monitor end-to-end web transactions and take corrective actions now
>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
> --
>
> Sent by mobile phone
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to