Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Arnaud Joly
Of you set the random state and put the same parameters, you are expected to have exactly the same model. To be concrete, if you do est_1 = GradientBoostingClassifie(random_state=0) est.fit(X, y) est_2 = GradientBoostingClassifie(random_state=0) est.fit(X, y) est_3 = GradientBoostingClassifie(r

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Debanjan Bhattacharyya
Thanks Arnaud Got it. Essentially what you are saying is While training classifier A, imagine there was a tie at estimator 3, on 2 features sets, e.g S1[12,3,4,5,6] and S2[2,3,4,5,6,7]. And S1 was chosen While training classifier B, there was a tie again at estimator 3 on the same sets and S2 was

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Arnaud Joly
During the growth of the decision tree, the best split is searched in a subset of max_features sampled among all features. Setting the random_state allows to draw the same subsets of features each time. Note that if several candidate splits have the same score, ties are broken randomly. Setting t

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Debanjan Bhattacharyya
Agree Gilles Which is why I later changed to max_features = None, but 6 is a good value, sqrt(36) ~=sqrt(30) and we had 30 features. Generally speaking, if I have 100 estimators (this is from previous experience and also the auto setting on your GBC) and 30 features, 6 should be a good start. But

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Gilles Louppe
Hi Deb, In your case, randomness comes from the max_features=6 setting, which makes the model not very stable from one execution to another, since the original dataset includes about 5x more input variables. Gilles On 16 September 2014 12:40, Debanjan Bhattacharyya wrote: > Thanks Arnaud > > ra

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Debanjan Bhattacharyya
Thanks Arnaud random_state is not listed as a parameter on http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html page. But it is listed as an argument in the constructor. Its my fault probably - that I did not notice it as a passable parameter. May be th

Re: [Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Arnaud Joly
Hi, To get reproducible model, you have to set the random_state. Best regards, Arnaud On 16 Sep 2014, at 12:08, Debanjan Bhattacharyya wrote: > Hi I recently participated in the Atlas (Higgs Boson Machine Learning > Challenge) > > One of the models I tried was GradientBoostingClassifier. I

[Scikit-learn-general] Unpredictability of GradientBoosting

2014-09-16 Thread Debanjan Bhattacharyya
Hi I recently participated in the Atlas (Higgs Boson Machine Learning Challenge) One of the models I tried was GradientBoostingClassifier. I found it extremely non deterministic. So if I use est = GradientBoostingClassifier(n_estimators=100, max_depth=10,min_samples_leaf=20,max_features=6,verbose