Re: [scikit-learn] Splitting Method on RandomForestClassifier

Guillaume Lemaître Tue, 02 Oct 2018 12:03:27 -0700

This is driven by the parameter min_impurity_decrease. 

Sent from my phone - sorry to be brief and potential misspell.



  Original Message  
From: [email protected]
Sent: 2 October 2018 20:48
To: [email protected]
Reply to: [email protected]
Subject: Re: [scikit-learn] Splitting Method on RandomForestClassifier

This is explained here

http://scikit-learn.org/stable/modules/ensemble.html#random-forests:

"In addition, when splitting a node during the construction of the tree, the 
split that is chosen is no longer the best split among all features. Instead, 
the split that is picked is the best split among a random subset of the 
features."

and the "best split" (in the decision trees) among the random feature subset is 
based on maximizing information gain or equivalently minimizing child node 
impurity as described here: 
http://scikit-learn.org/stable/modules/tree.html#mathematical-formulation


Looking at this, I have a question though ...

In the docs 
(http://scikit-learn.org/stable/modules/tree.html#mathematical-formulation) it 
says

"Select the parameters that minimises the impurity"

and

"Recurse for subsets Q_left and Q_right until the maximum allowable depth is 
reached"

So but this is basically not the whole definition, right? There should be 
condition that if the weighted average of the child node impurities for any 
given feature is not smaller than the parent node impurity, the tree growing 
algorithm would terminate, right?

Best,
Sebastian

> On Oct 2, 2018, at 10:49 AM, Guillaume Lemaître <[email protected]> 
> wrote:
> 
> In Random Forest, the best split for each feature is selected. The
> Extra Randomized Trees will make a random split instead.
> On Tue, 2 Oct 2018 at 17:43, Michael Reupold
> <[email protected]> wrote:
>> 
>> Hello all,
>> I currently struggle to find information what or which specific split 
>> Methods are used on the RandomForestClassifier. Is it a random selection? A 
>> median? The best of a set of methods?
>> 
>> Kind regards
>> 
>> Michael Reupold
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
> -- 
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Splitting Method on RandomForestClassifier

Reply via email to