Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/9413#discussion_r44046156
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -474,6 +487,75 @@ class LinearRegressionSummary private[regression] (
predictions.select(t(col(predictionCol),
col(labelCol)).as("residuals"))
}
+ /** Number of instances in DataFrame predictions */
+ lazy val numInstances: Long = predictions.count()
+
+ /** Degrees of freedom */
+ private val degreesOfFreedom: Long = if (model.getFitIntercept) {
+ numInstances - model.coefficients.size - 1
+ } else {
+ numInstances - model.coefficients.size
+ }
+
+ /**
+ * The weighted residuals, the usual residuals rescaled by
+ * the square root of the instance weights.
+ */
+ lazy val devianceResiduals: Array[Double] = {
--- End diff --
I'm late to comment, but am wondering:
* Why do we not return all deviance residuals as a DataFrame? If we only
return min,max, then that should be documented. But I'd prefer we return a
DataFrame with all deviance residuals.
* Should we follow R's example and just call this "residuals"? That will
let us add other types of residuals later (specified via an argument, with a
default argument of "deviance").
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]