I'm a bit unclear what you expect shuffling the data to do, Luca, since you
end up taking a random sample if you bootstrap and re-ordering it anyway.

Jacob

On Fri, Feb 5, 2016 at 1:32 PM, muhammad waseem <m.waseem.ah...@gmail.com>
wrote:

> Hi Luca,
> Thanks for your time and answer. I will try this with lower max_depth
> (both for randomised and RF to see what happens)*.*
> By number of variable used at each split, you mean min_samples_split,
> right?
>
> I did not use OOB score.
> I will also try to shuffle my data as well.
>
> Thanks again.
>
>
> On Fri, Feb 5, 2016 at 8:46 PM, Luca Puggini <lucapug...@gmail.com> wrote:
>
>> The number of trees (n estimators) should be as much large as possible.
>> It does not cause over fitting.  In random forest over fitting is usually
>> caused by the depth  and by variables with several unique values.  I'll
>> suggest you to start using randomized trees with low depth.  If you want to
>> use rf you can try to reduce the number of variables used at each split.
>>
>> Observe that if you use OOB to estimate the prediction error it may be
>> biased when  the number of trees is large.
>>
>> In addition I'll suggest you to shuffle the data at the beginning if you
>> can.
>>
>> On Fri, Feb 5, 2016, 5:14 PM muhammad waseem <m.waseem.ah...@gmail.com>
>> wrote:
>>
>>> Thanks Luca, I will give it a try. When you say extremely randomised,
>>> does this mean using large number of n_estimators?
>>>
>>> Also, any idea how to solve overfitting problem for random forest?
>>>
>>> Regards
>>> Waseem
>>>
>>>
>>> On Fri, Feb 5, 2016 at 5:00 PM, Luca Puggini <lucapug...@gmail.com>
>>> wrote:
>>>
>>>> Here there are the extra trees
>>>> http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html#sklearn.ensemble.ExtraTreesRegressor
>>>>
>>>> it work similarly to random forest.  In my experience RF tends often to
>>>> overfit.
>>>> I suggest you to start using the default parameters and cross validate
>>>> only on the max_depth parameter.  Start with small values of max_depth [2,
>>>> 3, 5, 7, 10] and check how the performances of the model change.
>>>>
>>>> Good Luck.
>>>> Luca
>>>>
>>>> On Fri, Feb 5, 2016 at 4:28 PM muhammad waseem <
>>>> m.waseem.ah...@gmail.com> wrote:
>>>>
>>>>> Hi Luca,
>>>>> Could you please explain how can do this randomized trees in
>>>>> scikit-learn? So you suggest I should be using Random forest?
>>>>>
>>>>>
>>>>> On Fri, Feb 5, 2016 at 4:13 PM, Luca Puggini <lucapug...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> To me the score is not so low. The model is slightly over fitting.
>>>>>> Try to repeat the same process with extremely randomized trees instead of
>>>>>> random forest and try to keep a low depth.
>>>>>> On Fri 5 Feb 2016 at 16:01 muhammad waseem <m.waseem.ah...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear All,
>>>>>>> I am trying to train my model using Scikit-learn's Random forest
>>>>>>> (Regression) and have tried to use GridSearch with Cross-validation 
>>>>>>> (CV=5)
>>>>>>> to tune hyperparameters. I fixed n_estimators =2000 for all cases. Below
>>>>>>> are the few searches that I performed.
>>>>>>>
>>>>>>> 1) max_features :[1,3,5], max_depth :[1,5,10,15],
>>>>>>> min_samples_split:[2,6,8,10], bootstrap:[True, False]
>>>>>>> The best were max_features=5, max_depth = 15, min_samples_split:10,
>>>>>>> bootstrap=True
>>>>>>> Best score = 0.8724
>>>>>>>
>>>>>>> Then I searched close to the parameters that were best;
>>>>>>> 2) max_features :[3,5,6], max_depth :[10,20,30,40],
>>>>>>> min_samples_split:[8,16,20,24], bootstrap:[True, False]
>>>>>>> The best were max_features=5, max_depth = 30, min_samples_split:20,
>>>>>>> bootstrap=True
>>>>>>> Best score = 0.8722
>>>>>>>
>>>>>>> Again, I searched close to the parameters that were best;
>>>>>>> 3) max_features :[2,4,6], max_depth :[25,35,40,50],
>>>>>>> min_samples_split:[22,28,34,40], bootstrap:[True, False]
>>>>>>>
>>>>>>> The best were max_features=4, max_depth = 25, min_samples_split:22,
>>>>>>> bootstrap=True
>>>>>>> Best score = 0.8725
>>>>>>>
>>>>>>> Then I used GridSearch among the best parameters found in the above
>>>>>>> runs and found the best on as max_features=4, max_depth = 15,
>>>>>>> min_samples_split:10,
>>>>>>> Best score = 0.8729
>>>>>>>
>>>>>>> Then I used these parameters to predict for an unknown dataset but
>>>>>>> got a very low score (around 0.72).
>>>>>>>
>>>>>>> My questions are; Am I doing the hyperparameter tuning correctly or
>>>>>>> I am missing something?
>>>>>>>
>>>>>>> 2) Why is my testing score very low as compared to my training and
>>>>>>> validation score and how can I improve it so that I get good predictions
>>>>>>> out of my model?
>>>>>>>
>>>>>>> Sorry, if these are basic questions as I am new to scikit-learn and
>>>>>>> ML.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application
>>>>>>> Performance
>>>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>>>> _______________________________________________
>>>>>>> Scikit-learn-general mailing list
>>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>>
>>>>>> --
>>>>>>
>>>>>> Sent by mobile phone
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>>> _______________________________________________
>>>>>> Scikit-learn-general mailing list
>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>> --
>>>>
>>>> Sent by mobile phone
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>> Monitor end-to-end web transactions and take corrective actions now
>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>> Monitor end-to-end web transactions and take corrective actions now
>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>> --
>>
>> Sent by mobile phone
>>
>>
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to