Hi, regarding MSE minimization vs variance reduction; it's been a few years but I remember that we had a discussion about that, where Gilles Louppe explained that those two are identical when I was confused about the wikipedia equation at https://en.wikipedia.org/wiki/Decision_tree_learning#Variance_reduction (I didn't read carefully and somehow thought that x_i etc was referring to feature columns instead of x being the target variable :P).
A better resource: I think Gilles also had a page about that in his thesis but I currently can't find the page. The thesis should be accessible from https://arxiv.org/abs/1407.7502 though, and I would recommend taking a look at "3.6.3 Finding the best binary split" and page 108+ on how it's implemented (if this is still up to date with the current implementation!?). This would probably address all your questions :). Best, Sebastian > On Aug 30, 2017, at 5:50 AM, Evans J.R.A. <jonny.ev...@soton.ac.uk> wrote: > > Hi there, > > I would like to fully understand how the Random Forest Regressor chooses how > to split the data at each node. > > I understand that each tree considers a boostrap sample of the training data, > and on each split a random subset of features (using max_features) are > considered. But among these features, how does the algorithm work out which > is the best split to make? I am using the default criterion ‘mse’, but don’t > understand the given explanation “equal to variance reduction as feature > selection criterion”. Does this mean that for each possible split that could > be made, the sum of variances of data in the child nodes is calculated, then > the algorithm would use the split with the least sum of variances? > > Kind regards, > > Jonny Evans > Doctoral Researcher > Transportation Research Group > Faculty of Engineering and the Environment > University of Southampton > Email: jonny.ev...@soton.ac.uk > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn