Andy and Arnaud: Thank you for the answers, and also for your work on this excellent library!
Dale: Thank you for the pointer! Best, Kevin On Fri, Oct 16, 2015 at 8:51 AM, Dale Smith <dsm...@nexidia.com> wrote: > I am studying Gilles Louppe's dissertation, which contains the best > explanation for various properties of tree methods. If you want to know > more, I would start here. > > http://www.montefiore.ulg.ac.be/~glouppe/pdf/phd-thesis.pdf > > Dale Smith, Ph.D. > Data Scientist > > > > d. 404.495.7220 x 4008 f. 404.795.7221 > Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, > GA 30305 > > > > > -----Original Message----- > From: Arnaud Joly [mailto:a.j...@ulg.ac.be] > Sent: Thursday, October 15, 2015 7:29 AM > To: scikit-learn-general@lists.sourceforge.net > Subject: Re: [Scikit-learn-general] Utility of random_state parameter for > decision trees > > Your intuition is correct. For a decision tree with max_feature=None, the > random_state is used to break ties randomly. > > Cheers, > Arnaud > > > > On 14 Oct 2015, at 17:33, Kevin Markham <justmark...@gmail.com> wrote: > > > > Hello, > > > > I'm a data science instructor that uses scikit-learn extensively in the > classroom. Yesterday I was teaching decision trees, and I summarized the > tree building process (for regression trees) as follows: > > > > 1. Begin at the top of the tree. > > 2. For every feature, examine every possible cutpoint, and choose the > feature and cutpoint such that the resulting tree has the lowest possible > mean squared error (MSE). Make that split. > > 3. Examine the two resulting regions, and again make a single split (in > one of the regions) to minimize the MSE. > > 4. Keep repeating step 3 until a stopping criterion is met. > > > > One question that came up is why there is a random_state parameter for a > DecisionTreeRegressor (or a DecisionTreeClassifier). Assuming that an > exhaustive search is performed before each split (meaning that every > possible cutpoint is checked for every feature), it is not obvious to me > what randomness is used during the tree building process, such that a > random_state is necessary. > > > > My best guesses were that the random_state is used for tiebreaking, or > perhaps that the search for the best split is not exhaustive and thus > random_state affects the way in which the search is performed. > > > > In summary, I am asking: Why is a random_state necessary for decision > trees? > > > > As a corollary, I am asking: Am I correctly representing how a decision > tree is built? > > > > Thank you very much! > > Kevin Markham > > ---------------------------------------------------------------------- > > -------- _______________________________________________ > > Scikit-learn-general mailing list > > Scikit-learn-general@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general