Hi, 

regarding MSE minimization vs variance reduction; it's been a few years but I 
remember that we had a discussion about that, where Gilles Louppe explained 
that those two are identical when I was confused about the wikipedia equation 
at https://en.wikipedia.org/wiki/Decision_tree_learning#Variance_reduction (I 
didn't read carefully and somehow thought that x_i etc was referring to feature 
columns instead of x being the target variable :P). 

A better resource: I think Gilles also had a page about that in his thesis but 
I currently can't find the page. The thesis should be accessible from 
https://arxiv.org/abs/1407.7502 though, and I would recommend taking a look at 
"3.6.3 Finding the best binary split"  and page 108+ on how it's implemented 
(if this is still up to date with the current implementation!?). This would 
probably address all your questions :).

Best,
Sebastian



> On Aug 30, 2017, at 5:50 AM, Evans J.R.A. <jonny.ev...@soton.ac.uk> wrote:
> 
> Hi there,
>  
> I would like to fully understand how the Random Forest Regressor chooses how 
> to split the data at each node.
>  
> I understand that each tree considers a boostrap sample of the training data, 
> and on each split a random subset of features (using max_features) are 
> considered. But among these features, how does the algorithm work out which 
> is the best split to make? I am using the default criterion ‘mse’, but don’t 
> understand the given explanation “equal to variance reduction as feature 
> selection criterion”. Does this mean that for each possible split that could 
> be made, the sum of variances of data in the child nodes is calculated, then 
> the algorithm would use the split with the least sum of variances?
>  
> Kind regards,
>  
> Jonny Evans
> Doctoral Researcher
> Transportation Research Group
> Faculty of Engineering and the Environment
> University of Southampton
> Email: jonny.ev...@soton.ac.uk
>  
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to