GitHub user actuaryzhang reopened a pull request:
https://github.com/apache/spark/pull/16344
[SPARK-18929][ML] Add Tweedie distribution in GLM
## What changes were proposed in this pull request?
I propose to add the full Tweedie family into the
GeneralizedLinearRegression model. The Tweedie family is characterized by a
power variance function. Currently supported distributions such as Gaussian,
Poisson and Gamma families are a special case of the Tweedie
https://en.wikipedia.org/wiki/Tweedie_distribution.
@yanboliang @srowen @sethah
I propose to add support for the other distributions:
- compound Poisson: 1 < varPower < 2. This one is widely used to model
zero-inflated continuous distributions, e.g., in insurance, finance, ecology,
meteorology, advertising etc.
- positive stable: varPower > 2 and varPower != 3. Used to model extreme
values.
- inverse Gaussian: varPower = 3.
The Tweedie family is supported in most statistical packages such as R
(statmod), SAS, h2o etc.
Changes made:
- Allow `tweedie` in family. Only `identity` and `log` links are allowed
for now.
- Add `varPower` to `GeneralizedLinearRegressionBase`, which takes values
in (1, 2) and (2, infty). Also set default value to 1.5 and add getter method.
- Add `Tweedie` class
- Add tests for tweedie GLM
Note:
- In computing deviance, use `math.max(y, 0.1)` to avoid taking inverse of
0. This is the same as in R: `tweedie()$dev.res`
- `aic` is not supported in this PR because the evaluation of the [Tweedie
density](http://www.statsci.org/smyth/pubs/tweediepdf-series-preprint.pdf) in
these cases are non-trivial. I will implement the density approximation method
in a future PR. R returns `null` (see `tweedie()$aic`).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/actuaryzhang/spark tweedie
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16344.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16344
----
commit 952887e485fb0d5fa669b3b4c9289b8069ee7769
Author: actuaryzhang <[email protected]>
Date: 2016-12-16T00:50:51Z
Add Tweedie family to GLM
commit 4f184ec458f5ed7d70bc5b8165481425f911d2a3
Author: actuaryzhang <[email protected]>
Date: 2016-12-19T22:50:02Z
Fix calculation in dev resid; Add test for different var power
commit 7fe39106332663d3671b94a8ffac48ca61c48470
Author: actuaryzhang <[email protected]>
Date: 2016-12-19T23:14:37Z
Merge test into GLR
commit bfcc4fb08d54156efc66b90d14c62ea7ff172afa
Author: actuaryzhang <[email protected]>
Date: 2016-12-20T22:59:05Z
Use Tweedie class instead of global object Tweedie; change variancePower to
varPower
commit a8feea7d8095170c1b5f18b7887f16a6d763e42c
Author: actuaryzhang <[email protected]>
Date: 2016-12-21T23:42:40Z
Allow Family to use GLRBase object directly
commit 233e2d338be8d36a74eaf578bfea804ae3617d4e
Author: actuaryzhang <[email protected]>
Date: 2016-12-22T01:56:34Z
Add TweedieFamily and implement specific distn within Tweedie
commit 17c55816c914bc96a8b5141356e3c117f343f303
Author: actuaryzhang <[email protected]>
Date: 2016-12-22T04:39:54Z
Clean up doc
commit 0b41825e99020976a34d8fe9c983f26de6c8c40f
Author: actuaryzhang <[email protected]>
Date: 2016-12-22T17:52:01Z
Move defaultLink and name to subclass of TweedieFamily
commit 6e8e60771afb4abe43e47c7fe186bad1541a8fac
Author: actuaryzhang <[email protected]>
Date: 2016-12-22T18:10:51Z
Change style for AIC
commit 8d7d34e258f9c7c03c80754d837ce847fcb0526e
Author: actuaryzhang <[email protected]>
Date: 2016-12-23T19:10:20Z
Rename Family methods and restore methods for tweedie subclasses
commit 6da7e3068e2c45a0faf7ff35c10b2750784d765e
Author: actuaryzhang <[email protected]>
Date: 2016-12-23T19:12:25Z
Update test
commit 9a71e89f629260c775922901a04c989f36ea4946
Author: actuaryzhang <[email protected]>
Date: 2016-12-27T17:16:40Z
Clean up doc
commit f461c09e65360f695ad3092b41bc26e0c61bbd95
Author: actuaryzhang <[email protected]>
Date: 2016-12-27T22:18:39Z
Put delta in Tweedie companion object
commit a839c4631dd17c4f3d0a0cc99e1b0af81419dda4
Author: actuaryzhang <[email protected]>
Date: 2016-12-27T22:23:57Z
Clean up doc
commit fab265278109eede4cce7ee506e8b29d481c4549
Author: actuaryzhang <[email protected]>
Date: 2017-01-05T19:32:06Z
Allow more link functions in tweedie
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]