[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20428 **[Test build #86798 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86798/testReport)** for PR 20428 at commit [`7a71c5a`](https://github.com/apache/spark/commit/7a71c5a294da230faf19965dc1d068adc3678411). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20432 **[Test build #86797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86797/testReport)** for PR 20432 at commit [`3fb3d78`](https://github.com/apache/spark/commit/3fb3d785a9b2497b6ec3b9ac9329db776568197c). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20431 **[Test build #86796 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86796/testReport)** for PR 20431 at commit [`9a4a484`](https://github.com/apache/spark/commit/9a4a4842b3f8281e73e564f4dfdad92017630760). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #86799 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86799/testReport)** for PR 20433 at commit [`830cf8d`](https://github.com/apache/spark/commit/830cf8d014ae17ade5fd771ca98c8c846c93). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20386 **[Test build #86801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86801/testReport)** for PR 20386 at commit [`42dc690`](https://github.com/apache/spark/commit/42dc69004ad37a5c4a5d8c96478a875ff4baed4e). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20427 **[Test build #86793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86793/testReport)** for PR 20427 at commit [`b4fdbbe`](https://github.com/apache/spark/commit/b4fdbbe265943012093fbc0f54e8b22184fa2987). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20343 **[Test build #86800 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86800/testReport)** for PR 20343 at commit [`d04b087`](https://github.com/apache/spark/commit/d04b0872bcc02b5eadd309c560cda77ff1b8da0a). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20386 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86801/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20428 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86798/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20432 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86797/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20343 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86800/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86799/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20386 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20343 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20431 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86796/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20432 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20428 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20431 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20427 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86793/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20427 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/20427 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20432 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20428 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20432 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/367/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20432 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20428 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/368/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20428 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/369/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20427 **[Test build #86805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86805/testReport)** for PR 20427 at commit [`b4fdbbe`](https://github.com/apache/spark/commit/b4fdbbe265943012093fbc0f54e8b22184fa2987). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20428 **[Test build #86804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86804/testReport)** for PR 20428 at commit [`7a71c5a`](https://github.com/apache/spark/commit/7a71c5a294da230faf19965dc1d068adc3678411). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20422 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20427 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20422 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/370/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20432 **[Test build #86803 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86803/testReport)** for PR 20432 at commit [`3fb3d78`](https://github.com/apache/spark/commit/3fb3d785a9b2497b6ec3b9ac9329db776568197c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20422 **[Test build #86806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86806/testReport)** for PR 20422 at commit [`6196770`](https://github.com/apache/spark/commit/61967706c6f3804a84819f8484abeff5d1d77eea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/20421 @felixcheung just added a few more behavior changes I found. Should be final now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20421 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20421 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/371/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20421 **[Test build #86807 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86807/testReport)** for PR 20421 at commit [`4a957f6`](https://github.com/apache/spark/commit/4a957f677eadfa5345a62f78b254c999869a1940). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20421 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20421 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/372/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20428 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20421 **[Test build #86808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86808/testReport)** for PR 20421 at commit [`469d87d`](https://github.com/apache/spark/commit/469d87db6278da7f157d8e6c81e7a26c1b969e7c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/20386 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20421 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20421 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86807/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20421 **[Test build #86807 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86807/testReport)** for PR 20421 at commit [`4a957f6`](https://github.com/apache/spark/commit/4a957f677eadfa5345a62f78b254c999869a1940). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20386 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/373/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20386 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20386 **[Test build #86809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86809/testReport)** for PR 20386 at commit [`42dc690`](https://github.com/apache/spark/commit/42dc69004ad37a5c4a5d8c96478a875ff4baed4e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20343 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20433 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20431 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20421 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20421 **[Test build #86808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86808/testReport)** for PR 20421 at commit [`469d87d`](https://github.com/apache/spark/commit/469d87db6278da7f157d8e6c81e7a26c1b969e7c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20421 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86808/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20431 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/375/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20431 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/374/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not f...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20431#discussion_r164670510 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -154,7 +154,7 @@ class DataFrameRangeSuite extends QueryTest with SharedSQLContext with Eventuall test("Cancelling stage in a query with Range.") { val listener = new SparkListener { override def onJobStart(jobStart: SparkListenerJobStart): Unit = { -eventually(timeout(10.seconds)) { +eventually(timeout(10.seconds), interval(1.millis)) { --- End diff -- The default interval is 15millis, IIUC. It is more possibly that the range stage finishes in the interval. So reduce it to 1millis. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20343 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/376/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20343 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20431 **[Test build #86811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86811/testReport)** for PR 20431 at commit [`9a4a484`](https://github.com/apache/spark/commit/9a4a4842b3f8281e73e564f4dfdad92017630760). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20343 **[Test build #86812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86812/testReport)** for PR 20343 at commit [`d04b087`](https://github.com/apache/spark/commit/d04b0872bcc02b5eadd309c560cda77ff1b8da0a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #86810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86810/testReport)** for PR 20433 at commit [`830cf8d`](https://github.com/apache/spark/commit/830cf8d014ae17ade5fd771ca98c8c846c93). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/20434 [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMethodLimit to 65535 ## What changes were proposed in this pull request? Still saw the performance regression introduced by `spark.sql.codegen.hugeMethodLimit` in our internal workloads. There are two major issues in the current solution. - The size of the complied byte code is not identical to the bytecode size of the method. The detection is still not accurate. - The bytecode size of a single operator (e.g., `SerializeFromObject`) could still exceed 8K limit. We saw the performance regression in such scenario. Since it is close to the release of 2.3, we decide to increase it to 64K for avoiding the perf regression. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark revertConf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20434.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20434 commit 4b358dc8cbaba5603ce861623403db8b146f9337 Author: gatorsmile Date: 2018-01-30T08:30:50Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20434 **[Test build #86813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86813/testReport)** for PR 20434 at commit [`4b358dc`](https://github.com/apache/spark/commit/4b358dc8cbaba5603ce861623403db8b146f9337). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20434 cc @sameeragarwal @zsxwing @rxin @cloud-fan @rednaxelafx @yhuai --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20434 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20434 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/377/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20378: [SPARK-11222][Build][Python] Python document styl...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20378#discussion_r164672962 --- Diff: dev/run-tests.py --- @@ -576,7 +576,10 @@ def main(): for f in changed_files): # run_java_style_checks() pass -if not changed_files or any(f.endswith(".py") for f in changed_files): +if not changed_files or any(f.endswith("lint-python") +or f.endswith("tox.ini") +or f.endswith(".py") +for f in changed_files): --- End diff -- Can you resolve the conflict? Looks like this change is already merged from #20338. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20404: [SPARK-23228][PYSPARK] Add Python Created jsparkSession ...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20404 @felixcheung I see, in that case, we should revert the last commit (cc4b8510c1445fb742c0d750958d352adfa84902) to check the default session is updated or not? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/20434#discussion_r164676771 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -660,12 +660,10 @@ object SQLConf { val WHOLESTAGE_HUGE_METHOD_LIMIT = buildConf("spark.sql.codegen.hugeMethodLimit") .internal() .doc("The maximum bytecode size of a single compiled Java function generated by whole-stage " + - "codegen. When the compiled function exceeds this threshold, " + - "the whole-stage codegen is deactivated for this subtree of the current query plan. " + - s"The default value is ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " + - "this is a limit in the OpenJDK JVM implementation.") --- End diff -- nit: might want to still keep the last line around to indicate where the 64k limit is coming from --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20427: [SPARK-23260][SPARK-23262][SQL] several data sour...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20427#discussion_r164676886 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -23,7 +23,7 @@ import org.apache.spark.sql.sources.v2.reader._ case class DataSourceV2Relation( --- End diff -- This is internal, we can clean it up at any time. I wanna focus on public APIs in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/20434 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20378: [SPARK-11222][Build][Python] Python document style check...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20378 One question I have is, do the current violations cause significant document error? Overall this is a good idea. However, is it worth enforcedly applying this if we consider the effort of fixing the violations, backporting difficulty in the future? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/20427 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20231: [SPARK-23000][TEST-HADOOP2.6] Fix Flaky test suit...
Github user sameeragarwal closed the pull request at: https://github.com/apache/spark/pull/20231 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18339 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164680081 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -54,10 +54,6 @@ case class WriteToDataSourceV2Exec(writer: DataSourceV2Writer, query: SparkPlan) } val rdd = query.execute() -val messages = new Array[WriterCommitMessage](rdd.partitions.length) - -logInfo(s"Start processing data source writer: $writer. " + - s"The input RDD has ${messages.length} partitions.") --- End diff -- might be good to keep this log. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164680632 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/memoryV2.scala --- @@ -118,14 +118,21 @@ class MemoryWriter(sink: MemorySinkV2, batchId: Long, outputMode: OutputMode) override def createWriterFactory: MemoryWriterFactory = MemoryWriterFactory(outputMode) - def commit(messages: Array[WriterCommitMessage]): Unit = { + private val messages = new ArrayBuffer[WriterCommitMessage]() + + override def add(message: WriterCommitMessage): Unit = synchronized { +messages += message + } + + def commit(): Unit = synchronized { val newRows = messages.flatMap { case message: MemoryWriterCommitMessage => message.data -} +}.toArray sink.write(batchId, outputMode, newRows) +messages.clear() } - override def abort(messages: Array[WriterCommitMessage]): Unit = { + override def abort(): Unit = { --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164680538 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala --- @@ -39,13 +41,20 @@ class ConsoleWriter(schema: StructType, options: DataSourceV2Options) def createWriterFactory(): DataWriterFactory[Row] = PackedRowWriterFactory - override def commit(epochId: Long, messages: Array[WriterCommitMessage]): Unit = { + private val messages = new ArrayBuffer[WriterCommitMessage]() + + override def add(message: WriterCommitMessage): Unit = synchronized { +messages += message + } + + override def commit(epochId: Long): Unit = synchronized { // We have to print a "Batch" label for the epoch for compatibility with the pre-data source V2 // behavior. -printRows(messages, schema, s"Batch: $epochId") +printRows(messages.toArray, schema, s"Batch: $epochId") +messages.clear() } - def abort(epochId: Long, messages: Array[WriterCommitMessage]): Unit = {} + def abort(epochId: Long): Unit = {} --- End diff -- we should clear the message array in abort too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164680686 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/memoryV2.scala --- @@ -135,14 +142,21 @@ class MemoryStreamWriter(val sink: MemorySinkV2, outputMode: OutputMode) override def createWriterFactory: MemoryWriterFactory = MemoryWriterFactory(outputMode) - override def commit(epochId: Long, messages: Array[WriterCommitMessage]): Unit = { + private val messages = new ArrayBuffer[WriterCommitMessage]() + + override def add(message: WriterCommitMessage): Unit = synchronized { +messages += message + } + + override def commit(epochId: Long): Unit = synchronized { val newRows = messages.flatMap { case message: MemoryWriterCommitMessage => message.data -} +}.toArray sink.write(epochId, outputMode, newRows) +messages.clear() } - override def abort(epochId: Long, messages: Array[WriterCommitMessage]): Unit = { + override def abort(epochId: Long): Unit = { --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164680877 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/MemorySinkV2Suite.scala --- @@ -41,19 +41,22 @@ class MemorySinkV2Suite extends StreamTest with BeforeAndAfter { test("continuous writer") { val sink = new MemorySinkV2 val writer = new MemoryStreamWriter(sink, OutputMode.Append()) -writer.commit(0, - Array( -MemoryWriterCommitMessage(0, Seq(Row(1), Row(2))), -MemoryWriterCommitMessage(1, Seq(Row(3), Row(4))), -MemoryWriterCommitMessage(2, Seq(Row(6), Row(7))) - )) +val messages = Seq( + MemoryWriterCommitMessage(0, Seq(Row(1), Row(2))), + MemoryWriterCommitMessage(1, Seq(Row(3), Row(4))), + MemoryWriterCommitMessage(2, Seq(Row(6), Row(7))) +) +messages.foreach(writer.add(_)) --- End diff -- nit: ``` writer.add(MemoryWriterCommitMessage(0, Seq(Row(1), Row(2 writer.add(MemoryWriterCommitMessage(1, Seq(Row(3), Row(4 .. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r164681157 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriterSuite.scala --- @@ -34,9 +33,9 @@ class ConsoleWriterSuite extends StreamTest { Console.withOut(captured) { val query = input.toDF().writeStream.format("console").start() try { -input.addData(1, 2, 3) +input.addData(1, 1, 1) --- End diff -- why this change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20386 I like this change! It adds a missing feature which is required for migrating the file-based data source(which use `FileCommitProtocol` and has a callback for task commit), and also make it possible to handle large jobs, which have a lot of tasks. Implementations can externalize the commit messages to avoid keeping too many messages in memory. LGTM, waiting feedback from others. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20430 CC @wzhfy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20361: [SPARK-23188][SQL] Make vectorized columar reader...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20361#discussion_r164685543 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -377,6 +377,12 @@ object SQLConf { .booleanConf .createWithDefault(true) + val PARQUET_VECTORIZED_READER_BATCH_SIZE = buildConf("spark.sql.parquet.batchSize") --- End diff -- I'd say it's very hard. If we need to satisfy a sizeInBytes limitation, we would need to load data record by record, and stop loading if we hit the limitation. But for performance reasons, we wanna load the data with batch, which needs to know the batch size ahead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19340 @mgaido91 @srowen I have the same concern as @Kevin-Ferret and @viirya I don't find the normailization of vectors before training, and the update of center seems incorrect. The arithmetic mean of all points in the cluster is not naturally the new cluster center: For EUCLIDEAN distance, we need to update the center to minimize the square lose, then we get the arithmetic mean as the closed-form solution; For COSINE similarity, we need to update the center to *maximize the cosine similarity*, the solution is also the arithmetic mean only if all vectors are of unit length. In matlab's doc for KMeans, it says "One minus the cosine of the included angle between points (treated as vectors). Each centroid is the mean of the points in that cluster, after *normalizing those points to unit Euclidean length*." I think RapidMiners's implementation of KMeans with cosine similarity is wrong, if it just assign new center with the arithmetic mean. Some reference: [Spherical k-Means Clustering](https://www.jstatsoft.org/article/view/v050i10/v50i10.pdf) [Scikit-Learn's example: Clustering text documents using k-means](http://scikit-learn.org/dev/auto_examples/text/plot_document_clustering.html) https://stats.stackexchange.com/questions/299013/cosine-distance-as-similarity-measure-in-kmeans https://www.quora.com/How-can-I-use-cosine-similarity-in-clustering-For-example-K-means-clustering --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20435: [SPARK-23268][SQL]Reorganize packages in data sou...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/20435 [SPARK-23268][SQL]Reorganize packages in data source V2 ## What changes were proposed in this pull request? 1. create a new package for partitioning/distribution related classes. As Spark will add new concrete implementations of `Distribution` in new releases, it is good to have a new package for partitioning/distribution related classes. 2. move streaming related class to package `org.apache.spark.sql.sources.v2.reader/writer.streaming`, instead of `org.apache.spark.sql.sources.v2.streaming.reader/writer`. So that the there won't be package reader/writer inside package streaming, which is quite confusing. ## How was this patch tested? Unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark new_pkg Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20435.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20435 commit da29b863970b8ca5039d547ae5240e5c34f9f11a Author: Wang Gengliang Date: 2018-01-30T08:31:10Z create a new package for partitioning/distribution related classes commit 3dc56226ed17637c808df9d8d03f60fcbbae9e55 Author: Wang Gengliang Date: 2018-01-30T09:16:01Z re-org streaming --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20435 **[Test build #86814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86814/testReport)** for PR 20435 at commit [`3dc5622`](https://github.com/apache/spark/commit/3dc56226ed17637c808df9d8d03f60fcbbae9e55). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20435 **[Test build #86814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86814/testReport)** for PR 20435 at commit [`3dc5622`](https://github.com/apache/spark/commit/3dc56226ed17637c808df9d8d03f60fcbbae9e55). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20435 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86814/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20435 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...
Github user rednaxelafx commented on a diff in the pull request: https://github.com/apache/spark/pull/20434#discussion_r164687283 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -660,12 +660,10 @@ object SQLConf { val WHOLESTAGE_HUGE_METHOD_LIMIT = buildConf("spark.sql.codegen.hugeMethodLimit") .internal() .doc("The maximum bytecode size of a single compiled Java function generated by whole-stage " + - "codegen. When the compiled function exceeds this threshold, " + - "the whole-stage codegen is deactivated for this subtree of the current query plan. " + - s"The default value is ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " + - "this is a limit in the OpenJDK JVM implementation.") --- End diff -- The 8000 byte limit is a HotSpot-specific thing, but the 64KB limit is imposed by the Java Class File format, as a part of the JVM spec. We may want to wordsmith a bit here to explain that: 1. 65535 is a largest bytecode size possible for a valid Java method; setting the default value to 65535 is effectively turning the limit off for whole-stage codegen; 2. For those that wish to turn this limit on when running on HotSpot, it may be preferable to set the value to `CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT` to match HotSpot's implementation. I don't have a good concrete suggestion as to how to concisely expression these two points in the doc string, though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20404: [SPARK-23228][PYSPARK] Add Python Created jsparkSession ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20404 For perfectness, I think we should hold a lock with JVM instance but I wonder if it's easily possible. I roughly knew this but I think underestimated this because I believe that will quite unlikely happens. I think reverting https://github.com/apache/spark/commit/cc4b8510c1445fb742c0d750958d352adfa84902 doesn't fully resolve the issue because I think the same thing can also happen between `if` and the next line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20435 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20435 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/378/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org