Of you set the random state and put the same parameters, you are expected to
have
exactly the same model. To be concrete, if you do
est_1 = GradientBoostingClassifie(random_state=0)
est.fit(X, y)
est_2 = GradientBoostingClassifie(random_state=0)
est.fit(X, y)
est_3 = GradientBoostingClassifie(r
Thanks Arnaud
Got it.
Essentially what you are saying is
While training classifier A, imagine there was a tie at estimator 3, on 2
features sets, e.g S1[12,3,4,5,6] and S2[2,3,4,5,6,7]. And S1 was chosen
While training classifier B, there was a tie again at estimator 3 on the
same sets and S2 was
During the growth of the decision tree, the best split is searched in a subset
of max_features sampled among all features.
Setting the random_state allows to draw the same subsets of features each time.
Note that if several candidate splits have the same score, ties are broken
randomly. Setting t
Agree Gilles
Which is why I later changed to max_features = None, but 6 is a good value,
sqrt(36) ~=sqrt(30) and we had 30 features.
Generally speaking, if I have 100 estimators (this is from previous
experience and also the auto setting on your GBC) and 30 features, 6 should
be a good start.
But
Hi Deb,
In your case, randomness comes from the max_features=6 setting, which
makes the model not very stable from one execution to another, since
the original dataset includes about 5x more input variables.
Gilles
On 16 September 2014 12:40, Debanjan Bhattacharyya wrote:
> Thanks Arnaud
>
> ra
Thanks Arnaud
random_state is not listed as a parameter on
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
page.
But it is listed as an argument in the constructor. Its my fault probably -
that I did not notice it as a passable parameter. May be th
Hi,
To get reproducible model, you have to set the random_state.
Best regards,
Arnaud
On 16 Sep 2014, at 12:08, Debanjan Bhattacharyya wrote:
> Hi I recently participated in the Atlas (Higgs Boson Machine Learning
> Challenge)
>
> One of the models I tried was GradientBoostingClassifier. I
Hi I recently participated in the Atlas (Higgs Boson Machine Learning
Challenge)
One of the models I tried was GradientBoostingClassifier. I found it
extremely non deterministic.
So if I use
est = GradientBoostingClassifier(n_estimators=100,
max_depth=10,min_samples_leaf=20,max_features=6,verbose