Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15683#discussion_r87487460
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -83,10 +83,11 @@ class GeneralizedLinearRegressionSuite
testData.toDF()
}
+ // force some labels to be exactly zero
datasetPoissonLog = generateGeneralizedLinearRegressionInput(
- intercept = 0.25, coefficients = Array(0.22, 0.06), xMean =
Array(2.9, 10.5),
+ intercept = -1.5, coefficients = Array(0.22, 0.06), xMean =
Array(2.9, 10.5),
xVariance = Array(0.7, 1.2), nPoints = 10000, seed, noiseLevel =
0.01,
- family = "poisson", link = "log").toDF()
+ family = "poisson", link = "log").map{x => LabeledPoint(if (x.label
< 0.7) 0.0 else x.label, x.features)}.toDF()
--- End diff --
I'd suggest creating a test case explicitly to check that Poisson accepts 0
values. Then we can just use some very small dataset (5 points or similar)
which contains a zero label.
The way it is now, if someone were to change the intercept to some other
large number in the future, then we wouldn't have any zero labels in the data
and we wouldn't exercise that test case.
Also, you can run the style checker via `./dev/lint-scala` to check that
the patch meets style guidelines. This line is too long, currently.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]