[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16740 LGTM Merging with master Thank you + @sethah for reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-07 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @jkbradley Thanks much for the review and suggestion. I updated the error message. Please let me know if there's anything else needed for this PR. Thanks. --- If your project is set up for

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72559/ Test PASSed. ---

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72559 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72559/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72559 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72559/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-07 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 Can one of the admins merge this PR since we have two approvals now? Thanks. @srowen @jkbradley @felixcheung @yanboliang --- If your project is set up for it, you can reply to this

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72487/ Test PASSed. ---

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72487/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Thanks for the comments. OK, added more tests to cover all families. It's not possible to test all family and link combination if that's what you mean: the tweedie family

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72487/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16740 Regarding the tests - I don't think the tests should change _depending on_ the implementation. I don't think it's valid to say that we don't need to test this thoroughly because we know that it's

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16740 @actuaryzhang the changes look good to me. I had some nit-picks which you marked as won't fix, and I'm ok with that. Thank you for fixing this issue! Maybe a committer can review this -

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah @imatiach-msft Could you take another look and let me know if there are any additional changes needed on this PR? Thanks! --- If your project is set up for it, you can reply to this

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72360/ Test PASSed. ---

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72360 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72360/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72360 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72360/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-03 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Thanks for the review. Made changes you suggested (except for the nit part). I added more tests although I don't think they are really necessary. The analytical approach is taking a

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @seth @imatiach-msft Let me know if there is any other changes needed. Thanks much for your review! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72299/ Test PASSed. ---

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72299/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72298/ Test PASSed. ---

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72298/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72299/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72298/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72296/ Test PASSed. ---

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72296/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72296/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @srowen would you please take a look and merge this if all is good? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16740 Ok, yeah, let's go with this fix now then - seems both R and statsmodels fit to compute the null model. Thanks for following up on that! --- If your project is set up for it, you can reply to this

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Your formula for offset does not seem to be a general solution, and I'm not sure if there exists an analytical formula, in particular when the link function is not identity or log. In

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-31 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Thanks for the clarification and providing an implementation. So, the pros is some speed improvement and the cons is the increased complexity (now we have three case: one for intercept

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-31 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16740 I agree having a special case is unsatisfying from an engineering perspective. In Spark it's a bit different than R since every iteration of IRLS will launch a Spark job, making a pass over the

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-31 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Thanks for your input. I can add more tests, but they are not adding too much since the algorithm is already tested in other tests. The analytical approach does not integrate

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-31 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16740 I don't really expect that we'll be changing things so often that this becomes a hassle. I think there is value in getting known results - in the current test the IRLS solver takes 3 iterations to

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-31 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Yes, we can directly compute the intercept easily. But I'm concerned that such special handling may not integrate well with other features or future changes. For example, we will need

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-31 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16740 Allowing offset will only require a small change to the intercept calculation, won't it? scala val agg = data.agg(sum(w * (col("label") - col("label"))), sum(w)).first()

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72185/ Test PASSed. ---

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72185/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @sethah Thanks for your review. Yes, using `foldLeft` would be the simplest fix. I have included both your suggested changes in the new commit. Yes, we could handle the special case

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72185/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16740 Since we already compute the number of features in the train method, why don't we just check if `numFeatures == 0` and then just compute the intercept as the link of the weighted average of the

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72161/ Test PASSed. ---

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72161 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72161/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72161 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72161/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @imatiach-msft Thanks for the comments! This test is based on existing tests in GLM. I can try to improve the style and streamline **all** tests in another PR but it will be weird to just

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread actuaryzhang
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16740 @srowen Thanks much for the suggestion. Included the simplification. Please let me know if there is anything else needed for this PR. Thanks! --- If your project is set up for it, you can

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16740 Added a few nit-pick comments, otherwise the changes LGTM! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72148/testReport)** for PR 16740 at commit

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16740 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72148/ Test PASSed. ---

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-01-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16740 **[Test build #72148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72148/testReport)** for PR 16740 at commit