GitHub user asolimando opened a pull request:

    https://github.com/apache/spark/pull/20632

    [SPARK-3159] added subtree pruning in the translation from LearningNode to 
Node, added unit tests for tree redundancy and adapted existing ones that were 
affected

    ## What changes were proposed in this pull request?
    
    Added subtree pruning in the translation from LearningNode to Node: a 
learning node having a single prediction value for all the leaves in the 
subtree rooted at it is translated into a LeafNode, instead of a (redundant) 
InternalNode 
    
    ## How was this patch tested?
    
    Added two unit tests under 
"mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala":
    - test("[SPARK-3159] tree model redundancy - binary classification")
    - test("[SPARK-3159] tree model redundancy - multiclass classification")
    
    4 existing unit tests relying on the tree structure (existence of a 
specific redundant subtree) had to be adapted as the tested components in the 
output tree are now pruned:
    the tests where checking on properties on some nodes (e.g., the details of 
the split), following what done in several other unit tests in the same files, 
those tests have been adapted to work on the intermediary data used in the 
learning phase rather than on the output model


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/asolimando/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20632.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20632
    
----
commit 0a78d16d6d7dc6aef49b39d505e4297d73f70a0e
Author: Alessandro Solimando <18898964+asolimando@...>
Date:   2018-02-17T05:07:10Z

    [SPARK-3159] added subtree pruning in the translation from LearningNode to 
Node, added unit tests for tree redundancy and adapted existing ones that were 
affected

commit 28da02f6c02aeebfa1512642b88dcc6c25b8a33d
Author: Alessandro Solimando <18898964+asolimando@...>
Date:   2018-02-17T05:08:38Z

    Merge branch 'SPARK-3159'

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to