Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/16699
  
    @sethah Thanks much for your review. 
    
    Regarding prediction, both R and my implementation here allow prediction 
with offsets. If the users want to get the predicted rates (instead of counts), 
then they can specify the offset to be `lit(0.0)` in the new data set when 
making predictions. 
    
    The following shows that R prediction includes offsets:
    ```
    A <- matrix(c(0, 1, 2, 3, 5, 7, 11, 13), 4, 2)
    b <- c(1, 0, 0.2, 2)
    off <- c(2, 3, 5, 4)
    df <- as.data.frame(cbind(A, b))
    model <- glm(formula = "b ~ .", family = "poisson", data = df, 
                 offset = off)
    p1 <- predict(model, df, type = "response")
    p2 <- as.numeric(exp(cbind(1, A) %*% coef(model) + off))
    sum(p1 - p2)
    ```
    
    This part of the code in the GeneralizedLinearRegression file shows that we 
do need offset when making the prediction. 
    ```
      protected def predict(features: Vector, offset: Double): Double = {
        val eta = predictLink(features, offset)
        familyAndLink.fitted(eta)
      }
    ```
    
    And this is the test for the prediction:
    ``` 
    val familyLink = FamilyAndLink(trainer)
            model.transform(dataset).select("features", "offset", "prediction", 
"linkPrediction")
              .collect().foreach {
              case Row(features: DenseVector, offset: Double, prediction1: 
Double,
              linkPrediction1: Double) =>
                val eta = BLAS.dot(features, model.coefficients) + 
model.intercept + offset
                val prediction2 = familyLink.fitted(eta)
                val linkPrediction2 = eta
                assert(prediction1 ~= prediction2 relTol 1E-5, "Prediction 
mismatch: GLM with " +
                  s"family = $family, and fitIntercept = $fitIntercept.")
                assert(linkPrediction1 ~= linkPrediction2 relTol 1E-5, "Link 
Prediction mismatch: " +
                  s"GLM with family = $family, and fitIntercept = 
$fitIntercept.")
            }
    ```
    
    Does this make sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to