Github user feynmanliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/7361#discussion_r34486955
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -53,14 +53,21 @@ class RegressionMetrics(predictionAndObservations:
RDD[(Double, Double)]) extend
)
summary
}
+ private lazy val SSerr = math.pow(summary.normL2(1), 2)
+ private lazy val SStot = summary.variance(0) * (summary.count - 1)
+ private lazy val SSreg = {
+ val yMean = summary.mean(0)
+ predictionAndObservations.map {
+ case (prediction, _) => math.pow(prediction - yMean, 2)
+ }.reduce(_ + _)
+ }
/**
- * Returns the explained variance regression score.
- * explainedVariance = 1 - variance(y - \hat{y}) / variance(y)
- * Reference: [[http://en.wikipedia.org/wiki/Explained_variation]]
+ * Returns the variance explained by regression.
+ * @see
[[https://en.wikipedia.org/wiki/Fraction_of_variance_unexplained]]
*/
def explainedVariance: Double = {
- 1 - summary.variance(1) / summary.variance(0)
+ SSreg / summary.count
--- End diff --
I updated the reference to one which is about explained/unexplained
variance in the context of regression and which also provides explicit formulas
for calculation. The calculation before this PR doesn't seem to correspond to
anything on either reference.
When the regression model is unbiased (e.g. has an intercept term), the sum
of squares can be partitioned (SStot = SSreg + SSerr) and the fraction of
variance explained (SSreg / SStot) [is
R^2](https://en.wikipedia.org/wiki/Coefficient_of_determination#As_explained_variance).
The [same
reference](https://en.wikipedia.org/wiki/Coefficient_of_determination#As_explained_variance)
defines explained variance as the variance of the model's predictions (SSreg /
n), which I think is more appropriate given that this method is called
`explainedVariance` not `proportionVarianceExplained` (which is also a bit
redundant with `r2`).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]