Hi all,

when I was implementing a bagging classifier based on scikit-learn's 
DecisionTreeClassifier, I noticed that the results were not deterministic and 
found that this was due to the random_state in the DescisionTreeClassifier 
(which is set to None by default).

I am wondering what exactly this random state is used for? I can imaging it 
being used for resolving ties if the information gain for multiple features is 
the same, or it could be that the feature splits of continuous features is 
different? (I thought the heuristic is to sort the features and to consider 
those feature values next to each associated with examples that have different 
class labels -- but is there maybe some random subselection involved?)

If someone knows more about this, where the random_state is used, I'd be happy 
to hear it :)

Also, we could then maybe add the info to the DecisionTreeClassifier's 
docstring, which is currently a bit too generic to be useful, I think:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tree.py


    random_state : int, RandomState instance or None, optional (default=None)
        If int, random_state is the seed used by the random number generator;
        If RandomState instance, random_state is the random number generator;
        If None, the random number generator is the RandomState instance used
        by `np.random`.


Best,
Sebastian
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to