GitHub user asolimando opened a pull request:
https://github.com/apache/spark/pull/20632
[SPARK-3159] added subtree pruning in the translation from LearningNode to
Node, added unit tests for tree redundancy and adapted existing ones that were
affected
## What changes were proposed in this pull request?
Added subtree pruning in the translation from LearningNode to Node: a
learning node having a single prediction value for all the leaves in the
subtree rooted at it is translated into a LeafNode, instead of a (redundant)
InternalNode
## How was this patch tested?
Added two unit tests under
"mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala":
- test("[SPARK-3159] tree model redundancy - binary classification")
- test("[SPARK-3159] tree model redundancy - multiclass classification")
4 existing unit tests relying on the tree structure (existence of a
specific redundant subtree) had to be adapted as the tested components in the
output tree are now pruned:
the tests where checking on properties on some nodes (e.g., the details of
the split), following what done in several other unit tests in the same files,
those tests have been adapted to work on the intermediary data used in the
learning phase rather than on the output model
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/asolimando/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20632.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20632
----
commit 0a78d16d6d7dc6aef49b39d505e4297d73f70a0e
Author: Alessandro Solimando <18898964+asolimando@...>
Date: 2018-02-17T05:07:10Z
[SPARK-3159] added subtree pruning in the translation from LearningNode to
Node, added unit tests for tree redundancy and adapted existing ones that were
affected
commit 28da02f6c02aeebfa1512642b88dcc6c25b8a33d
Author: Alessandro Solimando <18898964+asolimando@...>
Date: 2018-02-17T05:08:38Z
Merge branch 'SPARK-3159'
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]