Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/11601
Merged to master. Thanks @hhbyyh and also everyone for reviews.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74651/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #74651 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74651/testReport)**
for PR 11601 at commit
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/11601
Created [SPARK-19969](https://issues.apache.org/jira/browse/SPARK-19969) to
track doc and examples to be done for 2.2 release. I can help with this if
you're tied up.
---
If your project is set up
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #74651 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74651/testReport)**
for PR 11601 at commit
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/11601
jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/11601
Thanks @MLnick for being the Shepherd and providing consistent help on
discussion and review. The performance test matches what I got from my local
environment.
---
If your project is set up for
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74216/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #74216 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74216/testReport)**
for PR 11601 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #74216 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74216/testReport)**
for PR 11601 at commit
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/11601
By the way out of curiosity, I tested things out on a cluster (4x workers,
192 cores & 480GB RAM total), with 100 columns of 100 million doubles each, 1%
`NaN` occurrence. Reading from a Parquet
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/11601
Made a few last comments. LGTM.
cc @sethah @jkbradley I am going to merge this for 2.2. Let me know if you
have any final comments.
---
If your project is set up for it, you can reply to
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74038/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #74038 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74038/testReport)**
for PR 11601 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #74038 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74038/testReport)**
for PR 11601 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73868/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #73868 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73868/testReport)**
for PR 11601 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #73868 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73868/testReport)**
for PR 11601 at commit
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/11601
Hi @MLnick I changed the surrogateDF format for better extensibility in
the last update and added unit tests for multi-column support. Let me know if I
miss anything.
inputCol1|inputCol2
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/11601
Thanks a lot for making a pass @MLnick. The last update mainly focus on the
interface and behavior change. I'll make a pass and also address your comments.
---
If your project is set up for it,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73753/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #73753 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73753/testReport)**
for PR 11601 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #73753 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73753/testReport)**
for PR 11601 at commit
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/11601
jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/11601
Looks like CI was interrupted.
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73268/console
---
If your project is set up for it, you can reply to this email and have your
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73268/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/11601
Sent an update to add multi-column support. Let me know if this is not what
you have in mind.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #73268 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73268/testReport)**
for PR 11601 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66516/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #66516 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66516/consoleFull)**
for PR 11601 at commit
Github user hhbyyh commented on the issue:
https://github.com/apache/spark/pull/11601
Thanks for the comments @MLnick @jkbradley @sethah
I have sent update according to the comments and change
`ImputerModel.surrogate` and persistence format into DataFrame.
As for the
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #66516 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66516/consoleFull)**
for PR 11601 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66476/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #66476 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66476/consoleFull)**
for PR 11601 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #66476 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66476/consoleFull)**
for PR 11601 at commit
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/11601
The reason we didn't support mode was partly due to time and mostly due to
not being certain about the performance (e.g. if mode was called on a
non-categorical double column it could become quite
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/11601
I agree we should plan to support multiple columns and Vector columns in
the future. The 2 places I noticed may cause problems in the future are:
* ```ImputerModel.surrogate```: This is nice
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/11601
I'll make a review pass now
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/11601
So, I am trying to refresh my memory on this PR. I see we settled on not
supporting vector type and not supporting mode. Did we ever settle on
supporting multiple input columns? I am not sure I see
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/11601
@hhbyyh seems behavior of approx quantiles may have changed somewhere? Can
you take a look into it?
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65908/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #65908 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65908/consoleFull)**
for PR 11601 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #65908 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65908/consoleFull)**
for PR 11601 at commit
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/11601
jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/11601
Sorry for the delay - been a bit tied up! Overall looks good. Will leave
open a day or two for @sethah or @jkbradley make any final comment.
---
If your project is set up for it, you can reply to
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/11601
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65053/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #65053 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65053/consoleFull)**
for PR 11601 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/11601
**[Test build #65053 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65053/consoleFull)**
for PR 11601 at commit
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/11601
@hhbyyh could you update the since annotations to target `2.1.0`?
@jkbradley if you have a chance to review would be great. Thanks!
---
If your project is set up for it, you can reply to this email
59 matches
Mail list logo