[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

sethah Thu, 02 Nov 2017 14:24:32 -0700

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19638#discussion_r148664242
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala 
---
    @@ -125,4 +125,14 @@ class RegressionMetrics @Since("2.0.0") (
           1 - SSerr / SStot
         }
       }
    +
    +  /**
    +   * Returns adjusted R^2^, the adjusted coefficient of determination.
    +   * @see <a 
href="https://en.wikipedia.org/wiki/Coefficient_of_determination#Adjusted_R2";>
    +   * Coefficient of determination (Wikipedia)</a>
    +   */
    +  @Since("2.3.0")
    +  def r2adj: Double = {
    +    1 - (SSerr / (summary.count - summary.numParam - 1)) / (SStot / 
(summary.count - 1))
    --- End diff --
    
    This isn't correct for the case when there is no intercept. This [previous 
PR](https://github.com/apache/spark/pull/10384/) is relevant. Actually, there's 
a bigger problem: `RegressionMetrics` is only passed predictions and 
observations, nothing about the regression model that was used to fit it. 
Adjusted r2 doesn't make sense here. In fact, r2 shouldn't be here either since 
it's only valid for linear regression models. 
    
    The solution I propose: add a `val r2adj` in the linear regression summary, 
but simply define it in terms of the r2 value and don't add it to regression 
metrics or regression evaluator. 
    
    ```scala
    val r2adj: Double = {
        val interceptDOF = if (privateModel.getFitIntercept) 1 else 0
        1 - (1 - r2) * (numInstances - interceptDOF) / (numInstances - 
privateModel.coefficients.size - interceptDOF)
      }
    ```
    
    Ok, but then you can't use it when doing cross validation right? I'm not 
sure if there's a solution there - maybe to make a `LinearRegressionEvaluator`? 
`r2` and `adjr2` are not valid for non-linear regression 
http://statisticsbyjim.com/regression/r-squared-invalid-nonlinear-regression/.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19638: [SPARK-22422][ML] Add Adjusted R2 to RegressionMe...

Reply via email to