Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16131
@srowen
Try this example below or the example @sethah had issue with in #15683.
I have tried running the 2.1 version Poisson GLM on our data and it fails
for most of them (it does work when there are not lots of zero sometimes). I
traced down the cause and the fix proposed here seems to be the correct. At
least the Poisson GLM is working on the data where it failed before.
```
val datasetPoissonLogWithZero = Seq(
LabeledPoint(0.0, Vectors.dense(18, 1.0)),
LabeledPoint(1.0, Vectors.dense(12, 0.0)),
LabeledPoint(0.0, Vectors.dense(15, 0.0)),
LabeledPoint(0.0, Vectors.dense(13, 2.0)),
LabeledPoint(0.0, Vectors.dense(15, 1.0)),
LabeledPoint(1.0, Vectors.dense(16, 1.0)),
LabeledPoint(0.0, Vectors.dense(10, 0.0)),
LabeledPoint(0.0, Vectors.dense(15, 0.0)),
LabeledPoint(0.0, Vectors.dense(12, 2.0)),
LabeledPoint(0.0, Vectors.dense(13, 0.0)),
LabeledPoint(1.0, Vectors.dense(15, 0.0)),
LabeledPoint(1.0, Vectors.dense(15, 0.0)),
LabeledPoint(0.0, Vectors.dense(15, 0.0)),
LabeledPoint(0.0, Vectors.dense(12, 2.0)),
LabeledPoint(1.0, Vectors.dense(12, 2.0))
).toDF()
val glr = new GeneralizedLinearRegression()
.setFamily("poisson")
.setLink("log")
.setMaxIter(20)
.setRegParam(0)
val model = glr.fit(datasetPoissonLogWithZero)
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]