[GitHub] spark pull request #15683: [SPARK-18166][MLlib] Fix Poisson GLM bug due to w...

sethah Thu, 10 Nov 2016 13:19:12 -0800

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15683#discussion_r87487460
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
 ---
    @@ -83,10 +83,11 @@ class GeneralizedLinearRegressionSuite
           testData.toDF()
         }
     
    +    // force some labels to be exactly zero
         datasetPoissonLog = generateGeneralizedLinearRegressionInput(
    -      intercept = 0.25, coefficients = Array(0.22, 0.06), xMean = 
Array(2.9, 10.5),
    +      intercept = -1.5, coefficients = Array(0.22, 0.06), xMean = 
Array(2.9, 10.5),
           xVariance = Array(0.7, 1.2), nPoints = 10000, seed, noiseLevel = 
0.01,
    -      family = "poisson", link = "log").toDF()
    +      family = "poisson", link = "log").map{x => LabeledPoint(if (x.label 
< 0.7) 0.0 else x.label, x.features)}.toDF()
    --- End diff --
    
    I'd suggest creating a test case explicitly to check that Poisson accepts 0 
values. Then we can just use some very small dataset (5 points or similar) 
which contains a zero label. 
    
    The way it is now, if someone were to change the intercept to some other 
large number in the future, then we wouldn't have any zero labels in the data 
and we wouldn't exercise that test case. 
    
    Also, you can run the style checker via `./dev/lint-scala` to check that 
the patch meets style guidelines. This line is too long, currently.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15683: [SPARK-18166][MLlib] Fix Poisson GLM bug due to w...

Reply via email to