Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6386#discussion_r30996268
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
    @@ -160,8 +168,8 @@ class LogisticRegression(override val uid: String)
           new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, 
$(tol))
         }
     
    -    val initialWeightsWithIntercept =
    -      Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else numFeatures)
    +    val initialWeightsWithIntercept = optInitialWeights.getOrElse(
    --- End diff --
    
    In LoR, the intercept represents the prior of classes distribution, so it 
will converge faster if we set the intercept according to the prior if 
intercept is included. As a result, we override the intercept in the following 
couple lines.
    ```
        if ($(fitIntercept)) {
          /**
           * For binary logistic regression, when we initialize the weights as 
zeros,
           * it will converge faster if we initialize the intercept such that
           * it follows the distribution of the labels.
           *
           * {{{
           * P(0) = 1 / (1 + \exp(b)), and
           * P(1) = \exp(b) / (1 + \exp(b))
           * }}}, hence
           * {{{
           * b = \log{P(1) / P(0)} = \log{count_1 / count_0}
           * }}}
           */
          initialWeightsWithIntercept.toArray(numFeatures)
            = math.log(histogram(1).toDouble / histogram(0).toDouble)
        }
    ```  
    
    When we specify custom intercept, we should not override it by executing 
the above code. Also, we may check the dims of the custom weights, and if they 
are not agreed with the one generated pragmatically, we should use the one we 
generated pragmatically and log it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to