Github user erikerlandson commented on a diff in the pull request:
https://github.com/apache/spark/pull/13440#discussion_r218252156
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Gini.scala ---
@@ -71,6 +71,23 @@ object Gini extends Impurity {
@Since("1.1.0")
def instance: this.type = this
+ /**
+ * :: DeveloperApi ::
+ * p-values for test-statistic measures, unsupported for [[Gini]]
+ */
+ @Since("2.2.0")
+ @DeveloperApi
+ def calculate(calcL: ImpurityCalculator, calcR: ImpurityCalculator):
Double =
--- End diff --
I suspect that the generalization is closer to my newer signature
`val pval = imp.calculate(leftImpurityCalculator, rightImpurityCalculator)`
where you have all the context from the left and right nodes. The existing
gain-based calculation should fit into this framework, just doing its current
weighted average of purity gain.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]