[GitHub] spark pull request #13729: [SPARK-16008][ML] Remove unnecessary serializatio...

sethah Fri, 17 Jun 2016 08:30:54 -0700

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13729#discussion_r67528031
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
    @@ -990,11 +987,11 @@ private class LogisticAggregator(
                 var sum = 0.0
                 features.foreachActive { (index, value) =>
                   if (featuresStd(index) != 0.0 && value != 0.0) {
    -                sum += localCoefficientsArray(index) * (value / 
featuresStd(index))
    +                sum += coefficientsArray(index) * (value / 
featuresStd(index))
    --- End diff --
    
    I believe it is by design and not being overlooked (perhaps the mean was 
also passed in case future versions decided it was necessary to mean center the 
data). This comment is from MLlib:
    
    ```
         * Here, if useFeatureScaling is enabled, we will standardize the 
training features by dividing
         * the variance of each column (without subtracting the mean), and 
train the model in the
         * scaled space. Then we transform the coefficients from the scaled 
space to the original scale
         * as GLMNET and LIBSVM do.
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #13729: [SPARK-16008][ML] Remove unnecessary serializatio...

Reply via email to