Luca, I'm not sure I understand what you're saying. All test sets have
different information than their training sets--why does that mean
shuffling would help? Algorithmically the tree resorts the data anyway
without caring about the order they were in originally.
On Fri, Feb 5, 2016 at 5:50 PM, Luca Puggini <lucapug...@gmail.com> wrote:
> @muhammad by number of variables at each split I mean 'max_features'.
>
> On Sat, Feb 6, 2016 at 1:45 AM Luca Puggini <lucapug...@gmail.com> wrote:
>
>> If I understood correctly he is using a train set that is used for model
>> identification and training. A test set is then used to evaluate the
>> results. If he gets good performances on the train set and bad on the test
>> set it may be due to the fact that the test set contains different
>> information respect to the train set. This is for example common in time
>> series.
>> On Fri 5 Feb 2016 at 21:43 Jacob Schreiber <jmschreibe...@gmail.com>
>> wrote:
>>
>>> I'm a bit unclear what you expect shuffling the data to do, Luca, since
>>> you end up taking a random sample if you bootstrap and re-ordering it
>>> anyway.
>>>
>>> Jacob
>>>
>>> On Fri, Feb 5, 2016 at 1:32 PM, muhammad waseem <
>>> m.waseem.ah...@gmail.com> wrote:
>>>
>>>> Hi Luca,
>>>> Thanks for your time and answer. I will try this with lower max_depth
>>>> (both for randomised and RF to see what happens)*.*
>>>> By number of variable used at each split, you mean min_samples_split,
>>>> right?
>>>>
>>>> I did not use OOB score.
>>>> I will also try to shuffle my data as well.
>>>>
>>>> Thanks again.
>>>>
>>>>
>>>> On Fri, Feb 5, 2016 at 8:46 PM, Luca Puggini <lucapug...@gmail.com>
>>>> wrote:
>>>>
>>>>> The number of trees (n estimators) should be as much large as
>>>>> possible. It does not cause over fitting. In random forest over fitting
>>>>> is usually caused by the depth and by variables with several unique
>>>>> values. I'll suggest you to start using randomized trees with low depth.
>>>>> If you want to use rf you can try to reduce the number of variables used
>>>>> at
>>>>> each split.
>>>>>
>>>>> Observe that if you use OOB to estimate the prediction error it may be
>>>>> biased when the number of trees is large.
>>>>>
>>>>> In addition I'll suggest you to shuffle the data at the beginning if
>>>>> you can.
>>>>>
>>>>> On Fri, Feb 5, 2016, 5:14 PM muhammad waseem <m.waseem.ah...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Luca, I will give it a try. When you say extremely randomised,
>>>>>> does this mean using large number of n_estimators?
>>>>>>
>>>>>> Also, any idea how to solve overfitting problem for random forest?
>>>>>>
>>>>>> Regards
>>>>>> Waseem
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 5, 2016 at 5:00 PM, Luca Puggini <lucapug...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Here there are the extra trees
>>>>>>> http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html#sklearn.ensemble.ExtraTreesRegressor
>>>>>>>
>>>>>>> it work similarly to random forest. In my experience RF tends often
>>>>>>> to overfit.
>>>>>>> I suggest you to start using the default parameters and cross
>>>>>>> validate only on the max_depth parameter. Start with small values of
>>>>>>> max_depth [2, 3, 5, 7, 10] and check how the performances of the model
>>>>>>> change.
>>>>>>>
>>>>>>> Good Luck.
>>>>>>> Luca
>>>>>>>
>>>>>>> On Fri, Feb 5, 2016 at 4:28 PM muhammad waseem <
>>>>>>> m.waseem.ah...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Luca,
>>>>>>>> Could you please explain how can do this randomized trees in
>>>>>>>> scikit-learn? So you suggest I should be using Random forest?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Feb 5, 2016 at 4:13 PM, Luca Puggini <lucapug...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> To me the score is not so low. The model is slightly over fitting.
>>>>>>>>> Try to repeat the same process with extremely randomized trees
>>>>>>>>> instead of
>>>>>>>>> random forest and try to keep a low depth.
>>>>>>>>> On Fri 5 Feb 2016 at 16:01 muhammad waseem <
>>>>>>>>> m.waseem.ah...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Dear All,
>>>>>>>>>> I am trying to train my model using Scikit-learn's Random forest
>>>>>>>>>> (Regression) and have tried to use GridSearch with Cross-validation
>>>>>>>>>> (CV=5)
>>>>>>>>>> to tune hyperparameters. I fixed n_estimators =2000 for all cases.
>>>>>>>>>> Below
>>>>>>>>>> are the few searches that I performed.
>>>>>>>>>>
>>>>>>>>>> 1) max_features :[1,3,5], max_depth :[1,5,10,15],
>>>>>>>>>> min_samples_split:[2,6,8,10], bootstrap:[True, False]
>>>>>>>>>> The best were max_features=5, max_depth = 15,
>>>>>>>>>> min_samples_split:10, bootstrap=True
>>>>>>>>>> Best score = 0.8724
>>>>>>>>>>
>>>>>>>>>> Then I searched close to the parameters that were best;
>>>>>>>>>> 2) max_features :[3,5,6], max_depth :[10,20,30,40],
>>>>>>>>>> min_samples_split:[8,16,20,24], bootstrap:[True, False]
>>>>>>>>>> The best were max_features=5, max_depth = 30,
>>>>>>>>>> min_samples_split:20, bootstrap=True
>>>>>>>>>> Best score = 0.8722
>>>>>>>>>>
>>>>>>>>>> Again, I searched close to the parameters that were best;
>>>>>>>>>> 3) max_features :[2,4,6], max_depth :[25,35,40,50],
>>>>>>>>>> min_samples_split:[22,28,34,40], bootstrap:[True, False]
>>>>>>>>>>
>>>>>>>>>> The best were max_features=4, max_depth = 25,
>>>>>>>>>> min_samples_split:22, bootstrap=True
>>>>>>>>>> Best score = 0.8725
>>>>>>>>>>
>>>>>>>>>> Then I used GridSearch among the best parameters found in the
>>>>>>>>>> above runs and found the best on as max_features=4, max_depth = 15,
>>>>>>>>>> min_samples_split:10,
>>>>>>>>>> Best score = 0.8729
>>>>>>>>>>
>>>>>>>>>> Then I used these parameters to predict for an unknown dataset
>>>>>>>>>> but got a very low score (around 0.72).
>>>>>>>>>>
>>>>>>>>>> My questions are; Am I doing the hyperparameter tuning correctly
>>>>>>>>>> or I am missing something?
>>>>>>>>>>
>>>>>>>>>> 2) Why is my testing score very low as compared to my training
>>>>>>>>>> and validation score and how can I improve it so that I get good
>>>>>>>>>> predictions out of my model?
>>>>>>>>>>
>>>>>>>>>> Sorry, if these are basic questions as I am new to scikit-learn
>>>>>>>>>> and ML.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application
>>>>>>>>>> Performance
>>>>>>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>>>>>>> Monitor end-to-end web transactions and take corrective actions
>>>>>>>>>> now
>>>>>>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Scikit-learn-general mailing list
>>>>>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Sent by mobile phone
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application
>>>>>>>>> Performance
>>>>>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>>>>>> _______________________________________________
>>>>>>>>> Scikit-learn-general mailing list
>>>>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application
>>>>>>>> Performance
>>>>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>>>>> _______________________________________________
>>>>>>>> Scikit-learn-general mailing list
>>>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Sent by mobile phone
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application
>>>>>>> Performance
>>>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>>>> _______________________________________________
>>>>>>> Scikit-learn-general mailing list
>>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>>> _______________________________________________
>>>>>> Scikit-learn-general mailing list
>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>
>>>>> --
>>>>>
>>>>> Sent by mobile phone
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>> Monitor end-to-end web transactions and take corrective actions now
>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>> Monitor end-to-end web transactions and take corrective actions now
>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>> --
>>
>> Sent by mobile phone
>>
> --
>
> Sent by mobile phone
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general