Thanks Arnaud
random_state is not listed as a parameter on
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
page.
But it is listed as an argument in the constructor. Its my fault probably -
that I did not notice it as a passable parameter. May be the documentation
can be changed.
In hind sight, and as a generic approach, if I am training without
random_state, why and when would the boosted models vary highly ? (I have
seen data sets where they don't) ?
And what should be the right approach on having stable CV ? Not using
random_state and doing several rounds of CV and averaging it ? or using
different random_states
and doing several rounds of CV and averaging it ?
What exactly goes behind random_state from a Gradient Boosting approach ?
Regards
Deb
On Tue, Sep 16, 2014 at 3:52 PM, Arnaud Joly <a.j...@ulg.ac.be> wrote:
> Hi,
>
>
> To get reproducible model, you have to set the random_state.
>
> Best regards,
> Arnaud
>
>
> On 16 Sep 2014, at 12:08, Debanjan Bhattacharyya <b.deban...@gmail.com>
> wrote:
>
> Hi I recently participated in the Atlas (Higgs Boson Machine Learning
> Challenge)
>
> One of the models I tried was GradientBoostingClassifier. I found it
> extremely non deterministic.
> So if I use
>
> est = GradientBoostingClassifier(n_estimators=100,
> max_depth=10,min_samples_leaf=20,max_features=6,verbose=1)
>
> and train several times on the same training set (full). I end up having
> models (significantly different in size - I mean pickle
> output) which predict differently on the same instance. The difference is
> on the scale of 20 to 30% (so I have seen values varying between 0.7x and
> 0.4x) on the same instance. Even the (ordering) top 20 features (out of 30)
> differ from model to model quite significantly.
>
> Can someone tell me a bit more in details about this uncertainty.
>
> The train data set can be downloaded from here
> https://www.kaggle.com/c/higgs-boson/data
>
>
> Thanks
>
> Regards
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce.
> Perforce version control. Predictably reliable.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce.
> Perforce version control. Predictably reliable.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general