GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/20758
[SPARK-14681][ML] Provide label/impurity stats for spark.ml decision tree
nodes
## What changes were proposed in this pull request?
Provide label/impurity stats for spark.ml decision tree nodes.
API:
```
class TreeClassifierStatInfo
def getLabelCount(label: Int): Double
class TreeRegressorStatInfo
def getCount(): Double
def getSum(): Double
def getSquareSum(): Double
class Node
+++ def statInfo: TreeStatInfo
trait TreeStatInfo
def asTreeClassifierStatInfo: TreeClassifierStatInfo
def asTreeRegressorStatInfo: TreeRegressorStatInfo
```
## How was this patch tested?
UT added.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/WeichenXu123/spark tree_stat_api
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20758.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20758
commit e57ffaaad1666577d956c1f8f734f97569b93969
Author: WeichenXu
Date: 2018-03-07T10:37:22Z
init pr
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org