[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-02 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16740#discussion_r99269006 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -743,6 +743,55 @@ class

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @srowen would you please take a look and merge this if all is good? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Your formula for offset does not seem to be a general solution, and I'm not sure if there exists an analytical formula, in particular when the link function is not identity or log

[GitHub] spark issue #16729: [SPARK-19391][SparkR][ML] Tweedie GLM API for SparkR

2017-02-02 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16729 @felixcheung The one from statmod will be masked and must be called using `statmod:tweedie`. We can copy the whole `tweedie` function from statmod into `SparkR` and this will avoid

[GitHub] spark issue #16729: [SPARK-19391][SparkR][ML] Tweedie GLM API for SparkR

2017-02-01 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16729 @felixcheung Great point! Yes, I think it's better to stick with the statmod syntax and allow the tweedie family to be specified as`tweedie(var.power, link.power)`. I tried a few ways to allow

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-31 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Thanks for the clarification and providing an implementation. So, the pros is some speed improvement and the cons is the increased complexity (now we have three case: one for intercept

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-31 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Thanks for your input. I can add more tests, but they are not adding too much since the algorithm is already tested in other tests. The analytical approach does not integrate

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-31 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Yes, we can directly compute the intercept easily. But I'm concerned that such special handling may not integrate well with other features or future changes. For example, we will need

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Thanks for your review. Yes, using `foldLeft` would be the simplest fix. I have included both your suggested changes in the new commit. Yes, we could handle the special case

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-30 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r98580805 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -336,14 +361,19 @@ class

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-30 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16699 @yanboliang @zhengruifeng @srowen Could you guys take a look and let me know if there is any changes needed? Thanks much! --- If your project is set up for it, you can reply

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-30 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16699 @imatiach-msft Thanks much for your review. Renamed `off`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-30 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r98496017 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -336,14 +361,19 @@ class

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-01-30 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16740#discussion_r98493237 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala --- @@ -86,13 +86,11 @@ private[ml] class

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @imatiach-msft Thanks for the comments! This test is based on existing tests in GLM. I can try to improve the style and streamline **all** tests in another PR but it will be weird to just

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @srowen Thanks much for the suggestion. Included the simplification. Please let me know if there is anything else needed for this PR. Thanks! --- If your project is set up for it, you can

[GitHub] spark issue #16740: [SPARK-19400] Allow GLM to handle intercept only model

2017-01-30 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 The following is a simple example to illustrate the issue. ``` val dataset = Seq( (1.0, 1.0, 2.0, 0.0, 5.0), (0.5, 2.0, 1.0, 1.0, 2.0

[GitHub] spark pull request #16740: [SPARK-19400] Allow GLM to handle intercept only ...

2017-01-30 Thread actuaryzhang
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/16740 [SPARK-19400] Allow GLM to handle intercept only model ## What changes were proposed in this pull request? Intercept-only GLM is failing for non-Gaussian family because of reducing

[GitHub] spark pull request #16729: [SPARK-19391][SparkR][ML] Tweedie GLM API for Spa...

2017-01-29 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16729#discussion_r98363560 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/GeneralizedLinearRegressionWrapper.scala --- @@ -143,7 +150,12 @@ private[r] object

[GitHub] spark pull request #16729: [SPARK-19391][SparkR][ML] Tweedie GLM API for Spa...

2017-01-29 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16729#discussion_r98363550 --- Diff: R/pkg/inst/tests/testthat/test_mllib_regression.R --- @@ -77,6 +77,18 @@ test_that("spark.glm and predict", { out <- c

[GitHub] spark issue #16729: [SPARK-19391][SparkR][ML] Tweedie GLM API for SparkR

2017-01-29 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16729 @felixcheung Thanks so much for your quick and detailed review. I have made a new commit that removed dependency on `statmod` and fixed the issues you pointed out. The major change is to add

[GitHub] spark issue #16730: [SPARK-19395][SparkR]Convert coefficients in summary to ...

2017-01-29 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16730 @felixcheung I created a JIRA ticket and added in some tests. Please take a look. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #16730: [Minor][SparkR]Convert coefficients in summary to...

2017-01-28 Thread actuaryzhang
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/16730 [Minor][SparkR]Convert coefficients in summary to matrix ## What changes were proposed in this pull request? The `coefficients` component in model summary should be 'matrix

[GitHub] spark pull request #16729: [SPARK-19391][SparkR][ML] Tweedie GLM API for Spa...

2017-01-28 Thread actuaryzhang
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/16729 [SPARK-19391][SparkR][ML] Tweedie GLM API for SparkR ## What changes were proposed in this pull request? Port Tweedie GLM #16344 to SparkR @felixcheung @yanboliang

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-27 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16699 @zhengruifeng Thanks for the suggestions. Added casting and instrumentation. @imatiach-msft Thanks for the clarification! It is probably worth another PR to clean up all tests in GLM

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-26 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r98077231 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Instance.scala --- @@ -27,3 +27,25 @@ import org.apache.spark.ml.linalg.Vector

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-26 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16699 @imatiach-msft Many thanks again for the review. I have incorporated some of your suggestions: 1. Create initialization of instance directly if it is Gaussian(identity) to avoid expensive

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-25 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16699 @imatiach-msft Thanks so much for your detailed review. Incredibly helpful. I've addressed all your comments in the new commit. Major changes are highlighted below: 1. Create

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-25 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16699 @zhengruifeng @imatiach-msft Thanks much for pointing out the issue due to the hasOffset trait. This is what caused the test to fail. I have moved it to the GLRBase class. Things

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-01-24 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16630 This test just resists to start. Could someone help? Many thanks! @srowen @jkbradley @MLnick @yanboliang --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16699: [SPARK-18710] Add offset in GLM

2017-01-24 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16699 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16699: [SPARK-18710] Add offset in GLM

2017-01-24 Thread actuaryzhang
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/16699 [SPARK-18710] Add offset in GLM ## What changes were proposed in this pull request? Add support for offset in GLM. This is useful for at least two reasons: 1. Account for exposure

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-01-23 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16630 jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16675 @yanboliang Thanks. Seems to have passed tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-22 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @yanboliang Thanks so much for your detailed review. Your suggestions make lots of sense and I have included all of them in the new commit. Let me know if there is any other change needed

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16675 @yanboliang Thanks for the quick response. How about the new commit, where I just change the value from `getFamily` to lower case when necessary, i.e., in the calculation of p-value

[GitHub] spark issue #16675: [SPARK-19155][ML] Make family case insensitive in GLM

2017-01-22 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16675 I would prefer that `getFamily` returns lower case values directly, because using `getFamily.toLowerCase` can get very cumbersome and I use this a lot in another PR #16344. If we want to keep

[GitHub] spark pull request #16675: [SPARK-19155][ML] make getFamily case insensitive

2017-01-22 Thread actuaryzhang
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/16675 [SPARK-19155][ML] make getFamily case insensitive ## What changes were proposed in this pull request? This is a supplement to PR #16516 which did not make the value from `getFamily` case

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-01-19 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16630 jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-19 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 Could anybody help me understand what's causing this test to fail? I see several other ML PR failing as well, with the same error message like below: > Error instrument

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-01-19 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16630 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-18 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @srowen @yanboliang @felixcheung @jkbradley Could you help kick off the new test please? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-18 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @srowen @yanboliang @felixcheung Could you help kick off the new test please? Seems to be hanging for a day now. Thanks much. --- If your project is set up for it, you can reply

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-01-18 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16630 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-01-17 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16630 The following code illustrates the idea of this PR. ``` val datasetWithWeight = Seq( (1.0, 1.0, 0.0, 5.0), (0.5, 2.0, 1.0, 2.0), (1.0, 3.0, 2.0, 1.0

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-01-17 Thread actuaryzhang
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/16630 [SPARK-19270][ML] Add summary table to GLM summary ## What changes were proposed in this pull request? Add R-like summary table to GLM summary, which includes feature name (if exist

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-17 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @yanboliang Finally, the test is done. Is there anything else needed for this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-17 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-17 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 Still not testing... Been in the status "Asked to test" for a few days now. How can we resolve this? Please help kick off the test. Thanks! @yanboliang @felixcheung @sr

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-15 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-14 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-13 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @yanboliang Thanks for the review and comments. I have made a new commit that addressed all your comments. The main change is the new companion object `FamilyAndLink` and factory methods

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-13 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16344#discussion_r96061883 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -242,9 +316,9 @@ class

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-13 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16344#discussion_r96061873 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -613,25 +758,67 @@ object

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-11 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @yanboliang Thanks. Look forward to your feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-10 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 Sorry about closing this prematurely. I'm giving it another shot and I think I have an elegant solution to include `linkPower`. The new commit adds the following: 1. It implements

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-09 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-09 Thread actuaryzhang
GitHub user actuaryzhang reopened a pull request: https://github.com/apache/spark/pull/16344 [SPARK-18929][ML] Add Tweedie distribution in GLM ## What changes were proposed in this pull request? I propose to add the full Tweedie family into the GeneralizedLinearRegression model

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-06 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @srowen @yanboliang I'm closing this PR since it does not seem to be very clean to integrate into the current GLM setup. I appreciate all the comments and discussions. --- If your

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-06 Thread actuaryzhang
Github user actuaryzhang closed the pull request at: https://github.com/apache/spark/pull/16344 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-06 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @yanboliang Thanks for the feedback. However, I'm not sure why we need to be consistent with R on this one. The usage of 'tweedie' glm almost always uses `link.power = 0, 1, -1

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-05 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16344#discussion_r94849556 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -397,32 +432,121 @@ object

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-05 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16344#discussion_r94849540 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -365,7 +401,6 @@ object

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-05 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16344#discussion_r94849501 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -158,6 +183,16 @@ class

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-05 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @yanboliang Thanks for the detailed review. I have made all changes you suggested except for the part on the new power link function. Yes, the canonical link in the Tweedie in general is `1.0

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-03 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @yanboliang Did you get a chance to take another look at this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-27 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @srowen Made a new commit according to your suggestion. Everything looking good now? @yanboliang --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-26 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @srowen @yanboliang Any additional issues regarding this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-23 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @yanboliang Thanks much for the detailed comments. I have addressed all of them in the new commits. Please take another look. @srowen --- If your project is set up for it, you can

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-22 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16344#discussion_r93672741 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -303,20 +337,24 @@ object

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-22 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @srowen Thanks for the comments. Makes lots of sense to move the switch to subclass. I did not know one could override a `val`. In the new commit, I have moved the `defaultLink

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-21 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @srowen @yanboliang Thanks much for the feedback. I now have a better understanding of the code and the issue. I have made new commits reflecting your suggestions. The major changes

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-21 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16344#discussion_r93565567 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -303,14 +341,15 @@ object

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-21 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16344#discussion_r93565335 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -64,6 +64,27 @@ private[regression] trait

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-21 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16344#discussion_r93486159 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -303,14 +341,15 @@ object

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-21 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16344#discussion_r93482978 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -242,7 +275,12 @@ class

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-20 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @srowen Thanks for the comments. Really helpful. I have made a new commit that addresses the issues you raised: - I think the use of a global family object does not work well

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-20 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16344#discussion_r93290858 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -592,6 +629,59 @@ object

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-20 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16344#discussion_r93289668 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -592,6 +629,59 @@ object

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-19 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 Jenkins, add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2016-12-19 Thread actuaryzhang
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/16344 [SPARK-18929][ML] Add Tweedie distribution in GLM ## What changes were proposed in this pull request? I propose to add the full Tweedie family into the GeneralizedLinearRegression model

[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM

2016-12-13 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @srowen @sethah Thanks for all the helpful discussions! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM

2016-12-12 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @srowen @sethah One more commit that adds a test case with `weight = 4.7` which will round up to 5 to test the case @sethah described. All tests passed. I'm pretty sure R's rounding

[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM

2016-12-10 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @sethah Would you please review this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM

2016-12-08 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @sethah @srowen I updated the documentation. I think we have everything needed for this fix. Please merge and close this PR if there is no other issue. Thanks much for all the comments

[GitHub] spark pull request #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial...

2016-12-08 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16149#discussion_r91643659 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -468,11 +469,7 @@ object

[GitHub] spark pull request #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial...

2016-12-08 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16149#discussion_r91643515 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -715,7 +715,7 @@ class

[GitHub] spark pull request #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial...

2016-12-08 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16149#discussion_r91563322 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -715,7 +715,7 @@ class

[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM

2016-12-07 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @sethah @srowen I have added a comment to the weigthCol doc for the Binomial case. I also updated to test the case `weight < 0.5`, i.e., `round(weight) = 0`. All tests pas

[GitHub] spark pull request #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial...

2016-12-07 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16149#discussion_r91440862 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -479,7 +485,12 @@ object

[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM

2016-12-07 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @srowen @sethah I have cleaned up the change as suggested. Please review and let me know if there is any question. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM

2016-12-07 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @srowen @sethah Thanks for the comments. Yes, the major use case is to be able to handle multiple trials (integer weight, real-valued response). Indeed, a better way to do this is through

[GitHub] spark pull request #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial...

2016-12-07 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16149#discussion_r91246870 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -479,7 +479,12 @@ object

[GitHub] spark issue #16131: [SPARK-18701][ML] Fix Poisson GLM failure due to wrong i...

2016-12-06 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16131 @srowen Is this ready to be merged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16131: [SPARK-18701][ML] Fix Poisson GLM failure due to wrong i...

2016-12-05 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16131 @srowen Done. Thanks for the suggestion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16131: [SPARK-18701][ML] Fix Poisson GLM failure due to wrong i...

2016-12-05 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16131 @sethah Thanks for the review. I have updated according to your suggestion. @yanboliang @srowen Please take another look. Thanks. --- If your project is set up for it, you

[GitHub] spark issue #16131: [SPARK-18701][ML] Fix Poisson GLM failure due to wrong i...

2016-12-05 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16131 @srowen @yanboliang I have updated the code and further cleaned up the test. Please review and let me know if there is any question. Thanks. --- If your project is set up for it, you

[GitHub] spark pull request #16131: [SPARK-18701][ML] Fix Poisson GLM failure due to ...

2016-12-05 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16131#discussion_r90965352 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -505,7 +505,7 @@ object

[GitHub] spark pull request #16131: [SPARK-18701][ML] Fix Poisson GLM failure due to ...

2016-12-05 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/16131#discussion_r90955908 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -505,7 +505,7 @@ object

[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM

2016-12-05 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 Jenkins, add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

<    1   2   3   4   5   6   >