Github user asolimando commented on the issue:

    https://github.com/apache/spark/pull/20632
  
    Hello Sean,
    here is my understanding of the problem and the main intuition of the 
proposed solution:
    We want to have a tree such that it does not contain any redundant subtree.
    A subtree is redundant iff it is not composed by a leaf and all the leaves 
belonging to it agree on the prediction (the leafPredictions has at least two 
distinct values).
    
    Given that for any subtree there is exactly one root node, we "store" this 
information in its root (learning) node.
    
    A redundant node can be replaced by a single leaf, predicting the single 
prediction value reachable via that subtree.
    
    So, there are basically two cases in our top-down exploration:
    1) The subtree is redundant -> the whole subtree is explored (linearly) to 
collect the predictions, the node is translated into a LeafNode, and thus the 
toNode method is never called on its children, so no extra exploration
    2) The subtree is *not* redundant -> the whole subtree is explored 
(linearly) to collect the predictions, the node is translated into an 
InternalNode, and toNode recursively called on its children. However, at this 
point, the lazy value shouldBeLeaf (private lazy val shouldBeLeaf: Boolean) has 
been already set in the previous call, so there won't be any further 
exploration. Each further step down the recursion path will be able to "reason" 
locally on stored shouldBeLeaf values.
    
    Indeed, I was about to switch from the first top-down version to a 
bottom-up one for avoiding the repeated explorations, but then I realized that 
"lazy" could achieve the same result, while keeping the top-down exploration 
that is, in my opinion, more straightforward to read.
    
    I have updated the PR according to your suggestions, I haven't squashed the 
commits yet in case there will be further comments.
    
    I will reply to your comments below so I hope I won't miss anything.
    
    Thanks for your help.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to