Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16344
Sorry about closing this prematurely. I'm giving it another shot and I
think I have an elegant solution to include `linkPower`. The new commit adds
the following:
1. It implements the broad family of power link function, specified through
`linkPower`. There is now a `PowerLink` class for the power link function. It
has subclasses `Identity`, `Log`, `Inverse` and `Sqrt`. With this, the GLM now
supports all distributions characterized by power variance function and power
link function. For now, I restrict the `linkPower` to be in `[-10, 10]` for
numerical stability, but can change that.
2. The key to avoid all the messy coding is to use **only** `link` for
non-tweedie family, and **only** `lnkPower` for tweedie family, as @yanboliang
suggested. I have added validation for this, and also the `fromParams` method
in `Link` to get the correct Link object based on `link` and `linkPower`.
3. I added new tests to test tweedie with default link. For example, this
reproduces the Gaussian GLM estimate when `variancePower = 0` (default link
would be identity). Similarly, this addresses @yanboliang example where the
result from `val trainer = new
GeneralizedLinearRegression().setFamily("tweedie").setVariancePower(1.5)`
produces the same estimate as in R `glm(formula = "b ~ .", family =
tweedie(var.power=1.5), data = df)`.
@yanboliang @srowen Would you please take another look and let me know if
there is additional changes needed? Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]