@muhammad by number of variables at each split I mean 'max_features'.
On Sat, Feb 6, 2016 at 1:45 AM Luca Puggini <lucapug...@gmail.com> wrote:
> If I understood correctly he is using a train set that is used for model
> identification and training. A test set is then used to evaluate the
> results. If he gets good performances on the train set and bad on the test
> set it may be due to the fact that the test set contains different
> information respect to the train set. This is for example common in time
> series.
> On Fri 5 Feb 2016 at 21:43 Jacob Schreiber <jmschreibe...@gmail.com>
> wrote:
>
>> I'm a bit unclear what you expect shuffling the data to do, Luca, since
>> you end up taking a random sample if you bootstrap and re-ordering it
>> anyway.
>>
>> Jacob
>>
>> On Fri, Feb 5, 2016 at 1:32 PM, muhammad waseem <m.waseem.ah...@gmail.com
>> > wrote:
>>
>>> Hi Luca,
>>> Thanks for your time and answer. I will try this with lower max_depth
>>> (both for randomised and RF to see what happens)*.*
>>> By number of variable used at each split, you mean min_samples_split,
>>> right?
>>>
>>> I did not use OOB score.
>>> I will also try to shuffle my data as well.
>>>
>>> Thanks again.
>>>
>>>
>>> On Fri, Feb 5, 2016 at 8:46 PM, Luca Puggini <lucapug...@gmail.com>
>>> wrote:
>>>
>>>> The number of trees (n estimators) should be as much large as
>>>> possible. It does not cause over fitting. In random forest over fitting
>>>> is usually caused by the depth and by variables with several unique
>>>> values. I'll suggest you to start using randomized trees with low depth.
>>>> If you want to use rf you can try to reduce the number of variables used at
>>>> each split.
>>>>
>>>> Observe that if you use OOB to estimate the prediction error it may be
>>>> biased when the number of trees is large.
>>>>
>>>> In addition I'll suggest you to shuffle the data at the beginning if
>>>> you can.
>>>>
>>>> On Fri, Feb 5, 2016, 5:14 PM muhammad waseem <m.waseem.ah...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Luca, I will give it a try. When you say extremely randomised,
>>>>> does this mean using large number of n_estimators?
>>>>>
>>>>> Also, any idea how to solve overfitting problem for random forest?
>>>>>
>>>>> Regards
>>>>> Waseem
>>>>>
>>>>>
>>>>> On Fri, Feb 5, 2016 at 5:00 PM, Luca Puggini <lucapug...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Here there are the extra trees
>>>>>> http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html#sklearn.ensemble.ExtraTreesRegressor
>>>>>>
>>>>>> it work similarly to random forest. In my experience RF tends often
>>>>>> to overfit.
>>>>>> I suggest you to start using the default parameters and cross
>>>>>> validate only on the max_depth parameter. Start with small values of
>>>>>> max_depth [2, 3, 5, 7, 10] and check how the performances of the model
>>>>>> change.
>>>>>>
>>>>>> Good Luck.
>>>>>> Luca
>>>>>>
>>>>>> On Fri, Feb 5, 2016 at 4:28 PM muhammad waseem <
>>>>>> m.waseem.ah...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Luca,
>>>>>>> Could you please explain how can do this randomized trees in
>>>>>>> scikit-learn? So you suggest I should be using Random forest?
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Feb 5, 2016 at 4:13 PM, Luca Puggini <lucapug...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> To me the score is not so low. The model is slightly over fitting.
>>>>>>>> Try to repeat the same process with extremely randomized trees instead
>>>>>>>> of
>>>>>>>> random forest and try to keep a low depth.
>>>>>>>> On Fri 5 Feb 2016 at 16:01 muhammad waseem <
>>>>>>>> m.waseem.ah...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Dear All,
>>>>>>>>> I am trying to train my model using Scikit-learn's Random forest
>>>>>>>>> (Regression) and have tried to use GridSearch with Cross-validation
>>>>>>>>> (CV=5)
>>>>>>>>> to tune hyperparameters. I fixed n_estimators =2000 for all cases.
>>>>>>>>> Below
>>>>>>>>> are the few searches that I performed.
>>>>>>>>>
>>>>>>>>> 1) max_features :[1,3,5], max_depth :[1,5,10,15],
>>>>>>>>> min_samples_split:[2,6,8,10], bootstrap:[True, False]
>>>>>>>>> The best were max_features=5, max_depth = 15,
>>>>>>>>> min_samples_split:10, bootstrap=True
>>>>>>>>> Best score = 0.8724
>>>>>>>>>
>>>>>>>>> Then I searched close to the parameters that were best;
>>>>>>>>> 2) max_features :[3,5,6], max_depth :[10,20,30,40],
>>>>>>>>> min_samples_split:[8,16,20,24], bootstrap:[True, False]
>>>>>>>>> The best were max_features=5, max_depth = 30,
>>>>>>>>> min_samples_split:20, bootstrap=True
>>>>>>>>> Best score = 0.8722
>>>>>>>>>
>>>>>>>>> Again, I searched close to the parameters that were best;
>>>>>>>>> 3) max_features :[2,4,6], max_depth :[25,35,40,50],
>>>>>>>>> min_samples_split:[22,28,34,40], bootstrap:[True, False]
>>>>>>>>>
>>>>>>>>> The best were max_features=4, max_depth = 25,
>>>>>>>>> min_samples_split:22, bootstrap=True
>>>>>>>>> Best score = 0.8725
>>>>>>>>>
>>>>>>>>> Then I used GridSearch among the best parameters found in the
>>>>>>>>> above runs and found the best on as max_features=4, max_depth = 15,
>>>>>>>>> min_samples_split:10,
>>>>>>>>> Best score = 0.8729
>>>>>>>>>
>>>>>>>>> Then I used these parameters to predict for an unknown dataset but
>>>>>>>>> got a very low score (around 0.72).
>>>>>>>>>
>>>>>>>>> My questions are; Am I doing the hyperparameter tuning correctly
>>>>>>>>> or I am missing something?
>>>>>>>>>
>>>>>>>>> 2) Why is my testing score very low as compared to my training and
>>>>>>>>> validation score and how can I improve it so that I get good
>>>>>>>>> predictions
>>>>>>>>> out of my model?
>>>>>>>>>
>>>>>>>>> Sorry, if these are basic questions as I am new to scikit-learn
>>>>>>>>> and ML.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application
>>>>>>>>> Performance
>>>>>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>>>>>> _______________________________________________
>>>>>>>>> Scikit-learn-general mailing list
>>>>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Sent by mobile phone
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application
>>>>>>>> Performance
>>>>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>>>>> _______________________________________________
>>>>>>>> Scikit-learn-general mailing list
>>>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application
>>>>>>> Performance
>>>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>>>> _______________________________________________
>>>>>>> Scikit-learn-general mailing list
>>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>>
>>>>>> --
>>>>>>
>>>>>> Sent by mobile phone
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>>> _______________________________________________
>>>>>> Scikit-learn-general mailing list
>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>> --
>>>>
>>>> Sent by mobile phone
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>> Monitor end-to-end web transactions and take corrective actions now
>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>> Monitor end-to-end web transactions and take corrective actions now
>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
> --
>
> Sent by mobile phone
>
--
Sent by mobile phone
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general