Hi Sebastian, I think the random state is used to select the features that go into each split (look at the `max_features` parameter)
Cheers, Javier On Sun, Oct 28, 2018 at 12:07 AM Sebastian Raschka < m...@sebastianraschka.com> wrote: > Hi all, > > when I was implementing a bagging classifier based on scikit-learn's > DecisionTreeClassifier, I noticed that the results were not deterministic > and found that this was due to the random_state in the > DescisionTreeClassifier (which is set to None by default). > > I am wondering what exactly this random state is used for? I can imaging > it being used for resolving ties if the information gain for multiple > features is the same, or it could be that the feature splits of continuous > features is different? (I thought the heuristic is to sort the features and > to consider those feature values next to each associated with examples that > have different class labels -- but is there maybe some random subselection > involved?) > > If someone knows more about this, where the random_state is used, I'd be > happy to hear it :) > > Also, we could then maybe add the info to the DecisionTreeClassifier's > docstring, which is currently a bit too generic to be useful, I think: > > > https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tree.py > > > random_state : int, RandomState instance or None, optional > (default=None) > If int, random_state is the seed used by the random number > generator; > If RandomState instance, random_state is the random number > generator; > If None, the random number generator is the RandomState instance > used > by `np.random`. > > > Best, > Sebastian > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn