Hi Kevin.
The trees have a "max_features" parameter, that limits the number of
features used in each split.
This is not usually used in single decision trees, but rather in random
forests.
If using "max_features", then "random_state" is used to select which
features are used in each split.
I think they are not used otherwise but Arnauld and Gilles know better ;)
Hth,
Andy
On 10/14/2015 11:33 AM, Kevin Markham wrote:
Hello,
I'm a data science instructor that uses scikit-learn extensively in
the classroom. Yesterday I was teaching decision trees, and I
summarized the tree building process (for regression trees) as follows:
1. Begin at the top of the tree.
2. For every feature, examine every possible cutpoint, and choose the
feature and cutpoint such that the resulting tree has the lowest
possible mean squared error (MSE). Make that split.
3. Examine the two resulting regions, and again make a single split
(in one of the regions) to minimize the MSE.
4. Keep repeating step 3 until a stopping criterion is met.
One question that came up is why there is a random_state parameter for
a DecisionTreeRegressor (or a DecisionTreeClassifier). Assuming that
an exhaustive search is performed before each split (meaning that
every possible cutpoint is checked for every feature), it is not
obvious to me what randomness is used during the tree building
process, such that a random_state is necessary.
My best guesses were that the random_state is used for tiebreaking, or
perhaps that the search for the best split is not exhaustive and thus
random_state affects the way in which the search is performed.
In summary, I am asking: Why is a random_state necessary for decision
trees?
As a corollary, I am asking: Am I correctly representing how a
decision tree is built?
Thank you very much!
Kevin Markham
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general