Thanks Jacob!

You explain the ideas behind the two builders very well!


Best,

Hanna

________________________________
From: scikit-learn <scikit-learn-bounces+hzmao=hotmail....@python.org> on 
behalf of Jacob Schreiber <jmschreibe...@gmail.com>
Sent: Friday, September 22, 2017 1:02:54 PM
To: Scikit-learn mailing list
Subject: Re: [scikit-learn] Decision Tree Regressor - DepthFirstTreeBuilder vs 
BestFirstTreeBuilder

Hi Hanna

Thanks for the questions!

1) Best first tends to product unbalanced but sparser trees, and frequently 
produces more generalizable models by only capturing the most important 
interactions. Unbalanced isn't necessarily bad either. You can imagine that in 
some parts of the tree where there are complex split rules that are important 
to learn, but in other parts of the tree the additional splits only improve 
purity a tiny bit and risk overfitting (and thus being less generalizable).

2) If you let best first and depth first run until purity is reached, they will 
produce identical trees. The only difference is the ordering of the nodes as 
they get added to the tree. Best first will add nodes to the tree ordered by 
their increase in purity, and depth first adds nodes essentially in the order 
one would do a depth-first search. If one were to stop best first building 
early, they would get a tree where the important interactions are captured 
first, whereas if one were to stop a depth-first build early, they would get a 
really good split of one or maybe a few areas of the dataset (generally 
speaking). The reason max_leaf_nodes decides if BestFirstSplitter will be used 
or not is because it doesn't make sense to limit a depth first build by the 
number of nodes, and it doesn't make sense to run BestFirstSplitter without 
limiting the number of nodes in the tree.

Let me know if you have any further questions!

Jacob

On Thu, Sep 21, 2017 at 1:38 PM, hanzi mao 
<hz...@hotmail.com<mailto:hz...@hotmail.com>> wrote:

Hi,


I am reading the source code of the Decision Tree Regressor in sklearn. To 
build a tree, there are two fashions: depth first and best first.  Best first 
fashion is adopted only when user set max_leaf_nodes. Otherwise, the tree will 
be built using the DepthFirstTreeBuilder. My questions are:


  1.  Are there any practical considerations when to use depth-first or 
best-first? Dose the depth-first fashion has a overwhelming advantage / 
popularity compared with the best-first one which makes it a default choice?
  2.  I am kind of confused why using a optional parameter max_leaf_nodes  to 
decide whether to use BestFirstTreeBuilder or not. I am wondering if there are 
some considerations when you decide to develop like this.

Thanks!

Best,
Hanna

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org<mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to