Hi there,

I would like to fully understand how the Random Forest Regressor chooses how to 
split the data at each node.

I understand that each tree considers a boostrap sample of the training data, 
and on each split a random subset of features (using max_features) are 
considered. But among these features, how does the algorithm work out which is 
the best split to make? I am using the default criterion 'mse', but don't 
understand the given explanation "equal to variance reduction as feature 
selection criterion". Does this mean that for each possible split that could be 
made, the sum of variances of data in the child nodes is calculated, then the 
algorithm would use the split with the least sum of variances?

Kind regards,

Jonny Evans
Doctoral Researcher
Transportation Research Group
Faculty of Engineering and the Environment
University of Southampton
Email: jonny.ev...@soton.ac.uk<mailto:jonny.ev...@soton.ac.uk>

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to