Github user asolimando commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20632#discussion_r169742383
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala ---
    @@ -287,6 +291,34 @@ private[tree] class LearningNode(
         }
       }
     
    +  /**
    +   * @return true iff the node is a leaf.
    +   */
    +  private def isLeafNode(): Boolean = leftChild.isEmpty && 
rightChild.isEmpty
    +
    +  // the set of (leaf) predictions appearing in the subtree rooted at the 
given node.
    +  private lazy val leafPredictions: Set[Double] = {
    --- End diff --
    
    The variant proposed by @sethah looks complete to me, and it was indeed not 
necessary to store the predictions.
    
    If I got right the counterexample of @srowen, the new code would behave on 
it in this way:
    
    (input tree)
    1: InnerNode (not built yet)
    --1.1: InnerNode (not built yet)
    ----1.1.1 (label x): LeafNode (not built yet)
    ----1.1.2 (label x): LeafNode (not built yet)
    --1.2: InnerNode (not built yet)
    ----1.2.1 (label x): LeafNode (not built yet)
    ----1.2.2 (label x): LeafNode (not built yet)
    
    toNode on 1.1.1 and 1.1.2 builds two LeafNode, so when recursion come back 
to the context of 1.1, we have the following situation:
    
    1: InnerNode (not built yet)
    --1.1: InnerNode (not built yet)
    ----1.1.1 (label x): LeafNode (not built yet)
    ----1.1.2 (label x): LeafNode (not built yet)
    --1.2: InnerNode (not built yet)
    ----1.2.1 (label x): LeafNode (not built yet)
    ----1.2.2 (label x): LeafNode (not built yet)
    
    Therefore, the first case is triggered, and 1.1 becomes a LeafNode, giving 
us this situation:
    
    1: InnerNode (not built yet)
    --1.1 (label x): LeafNode
    --1.2: InnerNode (not built yet)
    ----1.2.1 (label x): LeafNode (not built yet)
    ----1.2.2 (label x): LeafNode (not built yet)
    
    Then, toNode is called on 1.2, and similarly to what happened to subtree 
rooted at 1.1 we have a match on the first case, resulting in this situation:
    
    1: InnerNode (not built yet)
    --1.1 (label x): LeafNode 
    --1.2 (label x): LeafNode 
    
    Now toNode on 1.1 and 1.2 has built two LeafNode, and the first case 
matches also in the context of node 1, that is finally translated into a single 
LeafNode, as expected.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to