[GitHub] spark pull request: [SPARK-14478][ML][MLLIB] Doc that StandardScal...

jkbradley Tue, 19 Apr 2016 23:03:07 -0700

GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/12519


    [SPARK-14478][ML][MLLIB] Doc that StandardScaler uses the corrected sample 
std

    ## What changes were proposed in this pull request?
    
    Currently, MLlib's StandardScaler scales columns using the corrected 
standard deviation (sqrt of unbiased variance). This matches what R's scale 
package does.
    
    This PR documents this fact.
    
    ## How was this patch tested?
    
    doc only

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark scaler-variance-doc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12519.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12519
    
----
commit 9cb04571d02a99e3e26a71c9addbfd8aba13e6d6
Author: Joseph K. Bradley <[email protected]>
Date:   2016-04-20T06:00:21Z

    Noted that StandardScaler uses the corrected sample std, not the unbiased 
std

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-14478][ML][MLLIB] Doc that StandardScal...

Reply via email to