GitHub user tengpeng opened a pull request:

    https://github.com/apache/spark/pull/19638

    [SPARK-22422][ML] Add Adjusted R2 to RegressionMetrics

    ## What changes were proposed in this pull request?
    
    I added adjusted R2 as a regression metric which was implemented in all 
major statistical analysis tools.
    
    In practice, no one looks at R2 alone. The reason is R2 itself is 
misleading. If we add more parameters, R2 will not decrease but only increase 
(or stay the same). This leads to overfitting. Adjusted R2 addressed this issue 
by using number of parameters as "weight" for the sum of errors.
    
    
    ## How was this patch tested?
    
    - Added a new unit test and passed.
    - ./dev/run-tests all passed.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tengpeng/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19638.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19638
    
----
commit adee7b418f9e9feb70ec9abfaba9ab34c789523b
Author: test <[email protected]>
Date:   2017-11-02T05:01:55Z

    Implement Adjusted R2 with a new unit test

commit 692fcb3dd332c677d9dd4f75ebb3ed14db495d7c
Author: test <[email protected]>
Date:   2017-11-02T05:03:12Z

    Merge branch 'master' of git://git.apache.org/spark

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to