[GitHub] spark pull request: [SPARK-12804][ML] Fix LogisticRegression with ...

dbtsai Wed, 13 Jan 2016 13:03:07 -0800

Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10743#discussion_r49649843
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
    @@ -339,9 +339,11 @@ class LogisticRegression @Since("1.2.0") (
              b = \log{P(1) / P(0)} = \log{count_1 / count_0}
              }}}
            */
    -      initialCoefficientsWithIntercept.toArray(numFeatures)
    -        = math.log(histogram(1) / histogram(0))
    -    }
    +       if (histogram.length >= 2) { // check to make sure indexing into 
histogram(1) is safe
    +         initialCoefficientsWithIntercept.toArray(numFeatures) = math.log(
    +           histogram(1) / histogram(0))
    --- End diff --
    
    In this case, the whole training step can be skipped. Currently, we only 
support binary LoR, so the max of `histogram.length` will be two. In LiR, when 
the `yStd == 0.0`, the model will be returned immediately without training, see 
https://github.com/feynmanliang/spark/blob/SPARK-12804/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L226
    
    We can do similar thing here like
    
    ```scala
    if (histogram.length == 2) {
      if (histogram(0) == 0.0) {
        model = (new LogisticRegressionModel(uid, Vectors.sparse(numFeatures, 
Seq()), Double.PositiveInfinity))
        return model
      } else {
        initialCoefficientsWithIntercept.toArray(numFeatures) = math.log(
                histogram(1) / histogram(0))
      } else if (histogram.length == 1) {
        model = (new LogisticRegressionModel(uid, Vectors.sparse(numFeatures, 
Seq()), Double.NegativeInfinity))
        return model
      } else {
        some excpetion
      }
    }
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12804][ML] Fix LogisticRegression with ...

Reply via email to