[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-07-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-07-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-07-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #93137 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93137/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-07-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-07-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92543/ Test PASSed. ---

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-07-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #92543 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92543/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-07-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-07-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/628/

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-07-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #92543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92543/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92487/ Test PASSed. ---

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #92487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92487/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #92487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92487/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/593/

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-28 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21546 >Hey @BryanCutler, btw, mind if I ask move the benchmarks into the PR description? Yeah, I guess it doesn't hurt. I'll copy them there. ---

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92432/ Test PASSed. ---

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #92432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92432/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/551/

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #92432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92432/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21546 Hey @BryanCutler, btw, mind i fI ask move the benchmarks into the PR description? --- - To unsubscribe, e-mail:

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92395/ Test PASSed. ---

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #92395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92395/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21546 ### Memory Improvements **toPandas()** The most significant improvement is reduction of the upper bound space complexity in the JVM driver. Before, the entire dataset was

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21546 ### Performance Tests - createDataFrame Tests run on a 4 node standalone cluster with 32 cores total, 14.04.1-Ubuntu and OpenJDK 8 measured wall clock time to execute

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21546 ### Performance Tests - toPandas Tests run on a 4 node standalone cluster with 32 cores total, 14.04.1-Ubuntu and OpenJDK 8 measured wall clock time to execute `toPandas()` and took

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21546 _UPDATE_ I changed `toPandas` to write out of order partitions to python as they come in, followed by a list of indices to represent the correct batch order. In python, the batches are

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #92395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92395/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/523/

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92233/ Test PASSed. ---

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #92233 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92233/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #92233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92233/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/423/

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-18 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21546 Thanks @ueshin and @icexelloss ! I do have some benchmarks that I'll post soon.. --- - To unsubscribe, e-mail:

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21546 > Btw, do you have benchmark number or something yet? ++1 --- - To unsubscribe, e-mail:

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91862/ Test PASSed. ---

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #91862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91862/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/150/

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4041/

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-14 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21546 cc @icexelloss --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-14 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21546 Thanks @viirya and @HyukjinKwon ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #91862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91862/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21546 cc @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91733/ Test PASSed. ---

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #91733 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91733/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/67/

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #91733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91733/testReport)** for PR 21546 at commit

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3956/

[GitHub] spark issue #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream format ...

2018-06-12 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21546 This is a WIP because I had to hack up some of the message processing code in Arrow. This should be done in Arrow, and then it can be cleaned up here. I will make these changes for version