Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16740
LGTM
Merging with master
Thank you + @sethah for reviewing!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
@jkbradley Thanks much for the review and suggestion. I updated the error
message. Please let me know if there's anything else needed for this PR. Thanks.
---
If your project is set up for
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72559/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72559 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72559/testReport)**
for PR 16740 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72559 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72559/testReport)**
for PR 16740 at commit
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
Can one of the admins merge this PR since we have two approvals now?
Thanks.
@srowen @jkbradley @felixcheung @yanboliang
---
If your project is set up for it, you can reply to this
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72487/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72487 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72487/testReport)**
for PR 16740 at commit
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
@sethah Thanks for the comments.
OK, added more tests to cover all families. It's not possible to test all
family and link combination if that's what you mean: the tweedie family
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72487 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72487/testReport)**
for PR 16740 at commit
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16740
Regarding the tests - I don't think the tests should change _depending on_
the implementation. I don't think it's valid to say that we don't need to test
this thoroughly because we know that it's
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16740
@actuaryzhang the changes look good to me. I had some nit-picks which you
marked as won't fix, and I'm ok with that. Thank you for fixing this issue!
Maybe a committer can review this -
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
@sethah @imatiach-msft Could you take another look and let me know if there
are any additional changes needed on this PR? Thanks!
---
If your project is set up for it, you can reply to this
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72360/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72360 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72360/testReport)**
for PR 16740 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72360 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72360/testReport)**
for PR 16740 at commit
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
@sethah Thanks for the review. Made changes you suggested (except for the
nit part). I added more tests although I don't think they are really necessary.
The analytical approach is taking a
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
@seth @imatiach-msft Let me know if there is any other changes needed.
Thanks much for your review!
---
If your project is set up for it, you can reply to this email and have your
reply
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72299/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72299 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72299/testReport)**
for PR 16740 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72298/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72298 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72298/testReport)**
for PR 16740 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72299 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72299/testReport)**
for PR 16740 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72298 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72298/testReport)**
for PR 16740 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72296/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72296 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72296/testReport)**
for PR 16740 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72296 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72296/testReport)**
for PR 16740 at commit
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
@srowen would you please take a look and merge this if all is good? Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16740
Ok, yeah, let's go with this fix now then - seems both R and statsmodels
fit to compute the null model. Thanks for following up on that!
---
If your project is set up for it, you can reply to this
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
@sethah Your formula for offset does not seem to be a general solution, and
I'm not sure if there exists an analytical formula, in particular when the link
function is not identity or log. In
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
@sethah Thanks for the clarification and providing an implementation. So,
the pros is some speed improvement and the cons is the increased complexity
(now we have three case: one for intercept
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16740
I agree having a special case is unsatisfying from an engineering
perspective. In Spark it's a bit different than R since every iteration of IRLS
will launch a Spark job, making a pass over the
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
@sethah Thanks for your input. I can add more tests, but they are not
adding too much since the algorithm is already tested in other tests.
The analytical approach does not integrate
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16740
I don't really expect that we'll be changing things so often that this
becomes a hassle. I think there is value in getting known results - in the
current test the IRLS solver takes 3 iterations to
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
@sethah Yes, we can directly compute the intercept easily. But I'm
concerned that such special handling may not integrate well with other features
or future changes. For example, we will need
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16740
Allowing offset will only require a small change to the intercept
calculation, won't it?
scala
val agg = data.agg(sum(w * (col("label") - col("label"))), sum(w)).first()
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72185/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72185 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72185/testReport)**
for PR 16740 at commit
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
@sethah Thanks for your review. Yes, using `foldLeft` would be the simplest
fix. I have included both your suggested changes in the new commit.
Yes, we could handle the special case
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72185 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72185/testReport)**
for PR 16740 at commit
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/16740
Since we already compute the number of features in the train method, why
don't we just check if `numFeatures == 0` and then just compute the intercept
as the link of the weighted average of the
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72161/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72161 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72161/testReport)**
for PR 16740 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72161 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72161/testReport)**
for PR 16740 at commit
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
@imatiach-msft Thanks for the comments! This test is based on existing
tests in GLM. I can try to improve the style and streamline **all** tests in
another PR but it will be weird to just
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16740
@srowen Thanks much for the suggestion. Included the simplification. Please
let me know if there is anything else needed for this PR. Thanks!
---
If your project is set up for it, you can
Github user imatiach-msft commented on the issue:
https://github.com/apache/spark/pull/16740
Added a few nit-pick comments, otherwise the changes LGTM!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72148 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72148/testReport)**
for PR 16740 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16740
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72148/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16740
**[Test build #72148 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72148/testReport)**
for PR 16740 at commit
59 matches
Mail list logo