[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18798 Merged into master, thanks for all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread thunterdb
Github user thunterdb commented on the issue: https://github.com/apache/spark/pull/18798 Thank you @yanboliang. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80671/ Test PASSed. ---

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80671/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80669/ Test PASSed. ---

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80669/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80671/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80669/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80668/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80668/ Test FAILed. ---

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80668/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 @yanboliang I will update ASAP, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-14 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18798 @WeichenXu123 I left some minor comments, otherwise, LGTM. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-14 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18798 @WeichenXu123 Thanks! Looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-14 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 @viirya Sure! comment updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-14 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18798 Sorry can we make the performance data clear? Currently it doesn't say what the unit of the numbers is. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-11 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18798 @thunterdb I'm on travel these days, will do a final pass and merge it on next Monday/Tuesday. Thanks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-10 Thread thunterdb
Github user thunterdb commented on the issue: https://github.com/apache/spark/pull/18798 @yanboliang do you feel comfortable to merge this PR? I think that all the questions have been addressed. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80407/ Test PASSed. ---

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80407 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80407/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80407/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-08 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80403/ Test FAILed. ---

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80403/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-08 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80403/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80363/ Test PASSed. ---

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80363 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80363/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80359/ Test PASSed. ---

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80359/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80363/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-02 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18798 @WeichenXu123 @thunterdb Thanks for this great work, we are always happy to see improvement which can help us to migrate MLlib workload to Dataset based API. Here are my two cents:

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 @thunterdb 1) The dataframe deserialize from binary data will add overhead, (maybe there is compaction or not, it depends on the datatype, cc @liancheng ) about 1x performance in my test.

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread thunterdb
Github user thunterdb commented on the issue: https://github.com/apache/spark/pull/18798 cc @hvanhovell as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread thunterdb
Github user thunterdb commented on the issue: https://github.com/apache/spark/pull/18798 Thank you for the performance numbers @WeichenXu123 , I have a couple of comments: - you say that SQL uses adaptive compaction. How bad is that? I assume it adds some overhead. - did

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18798 performance data attached. cc @thunterdb @jkbradley --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80126/ Test PASSed. ---

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18798 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80126 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80126/testReport)** for PR 18798 at commit

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread thunterdb
Github user thunterdb commented on the issue: https://github.com/apache/spark/pull/18798 @WeichenXu123 thanks! Can you post some performance numbers as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18798: [SPARK-19634][ML] Multivariate summarizer - dataframes A...

2017-08-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18798 **[Test build #80126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80126/testReport)** for PR 18798 at commit