Hi Kevin.

The trees have a "max_features" parameter, that limits the number of features used in each split. This is not usually used in single decision trees, but rather in random forests. If using "max_features", then "random_state" is used to select which features are used in each split.

I think they are not used otherwise but Arnauld and Gilles know better ;)

Hth,
Andy

On 10/14/2015 11:33 AM, Kevin Markham wrote:
Hello,

I'm a data science instructor that uses scikit-learn extensively in the classroom. Yesterday I was teaching decision trees, and I summarized the tree building process (for regression trees) as follows:

1. Begin at the top of the tree.
2. For every feature, examine every possible cutpoint, and choose the feature and cutpoint such that the resulting tree has the lowest possible mean squared error (MSE). Make that split. 3. Examine the two resulting regions, and again make a single split (in one of the regions) to minimize the MSE.
4. Keep repeating step 3 until a stopping criterion is met.

One question that came up is why there is a random_state parameter for a DecisionTreeRegressor (or a DecisionTreeClassifier). Assuming that an exhaustive search is performed before each split (meaning that every possible cutpoint is checked for every feature), it is not obvious to me what randomness is used during the tree building process, such that a random_state is necessary.

My best guesses were that the random_state is used for tiebreaking, or perhaps that the search for the best split is not exhaustive and thus random_state affects the way in which the search is performed.

In summary, I am asking: Why is a random_state necessary for decision trees?

As a corollary, I am asking: Am I correctly representing how a decision tree is built?

Thank you very much!
Kevin Markham


------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to