Feynman Liang created SPARK-12804:
-------------------------------------

             Summary: ml.classification.LogisticRegression fails when 
FitIntercept with same-label dataset
                 Key: SPARK-12804
                 URL: https://issues.apache.org/jira/browse/SPARK-12804
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 1.6.0
            Reporter: Feynman Liang


When training LogisticRegression on a dataset where the label is all 0 or all 
1, an array out of bounds exception is thrown. The problematic code is

{code}
      initialCoefficientsWithIntercept.toArray(numFeatures)
        = math.log(histogram(1) / histogram(0))
    }
{/code}

The correct behaviour is to short-circuit training entirely when only a single 
label is present (can be detected from {{labelSummarizer}}) and return a 
classifier which assigns all true/false with infinite weights.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to