[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86660/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20224 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86660 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86660/testReport)** for PR 20224 at commit [`fd8983e`](https://github.com/apache/spark/commit/fd8983edaee5fe9a7968de45a174d6781128296e). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)` * ` final class $className extends $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86660 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86660/testReport)** for PR 20224 at commit [`fd8983e`](https://github.com/apache/spark/commit/fd8983edaee5fe9a7968de45a174d6781128296e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/247/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86655/testReport)** for PR 20224 at commit [`b2e2cb0`](https://github.com/apache/spark/commit/b2e2cb0db7c3e0dadf84dd4a1b5d2fbcbae5394f). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)` * ` final class $className extends $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86655/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86655/testReport)** for PR 20224 at commit [`b2e2cb0`](https://github.com/apache/spark/commit/b2e2cb0db7c3e0dadf84dd4a1b5d2fbcbae5394f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/244/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86638/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86638/testReport)** for PR 20224 at commit [`ce8171a`](https://github.com/apache/spark/commit/ce8171aa115d05d0a198964a64e7ef60d9637502). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)` * ` final class $className extends $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86636/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86636/testReport)** for PR 20224 at commit [`5c99777`](https://github.com/apache/spark/commit/5c99777a6a9aa21905158d53f9e393c0ad7acc9f). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)` * ` final class $className extends $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86632/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86631/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86632/testReport)** for PR 20224 at commit [`a11232e`](https://github.com/apache/spark/commit/a11232e162c50a1b9312410debb9fb7c4766f9a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)` * ` final class $className extends $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86631/testReport)** for PR 20224 at commit [`a11232e`](https://github.com/apache/spark/commit/a11232e162c50a1b9312410debb9fb7c4766f9a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)` * ` final class $className extends $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 Updated again to address @cloud-fan 's comments: removed unneeded test case and added a bit more comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86638/testReport)** for PR 20224 at commit [`ce8171a`](https://github.com/apache/spark/commit/ce8171aa115d05d0a198964a64e7ef60d9637502). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/229/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86636/testReport)** for PR 20224 at commit [`5c99777`](https://github.com/apache/spark/commit/5c99777a6a9aa21905158d53f9e393c0ad7acc9f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/227/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 Updated again. Addressed @viirya 's comments: 1. added comments to explain where this codegen stage ID is used 2. moved an assertion message to a comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86632/testReport)** for PR 20224 at commit [`a11232e`](https://github.com/apache/spark/commit/a11232e162c50a1b9312410debb9fb7c4766f9a2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/224/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20224 LGTM with minor comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20224 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86631/testReport)** for PR 20224 at commit [`a11232e`](https://github.com/apache/spark/commit/a11232e162c50a1b9312410debb9fb7c4766f9a2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86627/testReport)** for PR 20224 at commit [`a11232e`](https://github.com/apache/spark/commit/a11232e162c50a1b9312410debb9fb7c4766f9a2). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)` * ` final class $className extends $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86627/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20224 LGTM, pending jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86627/testReport)** for PR 20224 at commit [`a11232e`](https://github.com/apache/spark/commit/a11232e162c50a1b9312410debb9fb7c4766f9a2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/222/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20224 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86612/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86612 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86612/testReport)** for PR 20224 at commit [`e449216`](https://github.com/apache/spark/commit/e449216392402510444ecd002c22e884bcbed2fc). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `.doc(\"When true, embed the codegen stage ID into the class name of the generated class\")` * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)` * ` final class $className extends $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 Updated the PR: 1. addressed @cloud-fan 's comment to make sure the `codegenStageId` is properly copied in transformations after `CollapseCodegenStages`. Added a new unit test case for it. The test case triggers `ReuseExchange`, which is a rule that runs after `CollapseCodegenStages`. Before this update, the explain output for the test query is: ``` == Physical Plan == *(0) Project [id#7L] +- *(0) SortMergeJoin [id#7L], [id#10L], Inner :- *(2) Sort [id#7L ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(id#7L, 200) : +- *(1) Range (0, 100, step=1, splits=8) +- *(0) Sort [id#10L ASC NULLS FIRST], false, 0 +- ReusedExchange [id#10L], Exchange hashpartitioning(id#7L, 200) ``` Note the `*(0)`s are indicating that the `codegenStageId`s are not properly copied. After this update, it is now: ``` == Physical Plan == *(5) Project [id#0L] +- *(5) SortMergeJoin [id#0L], [id#3L], Inner :- *(2) Sort [id#0L ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(id#0L, 200) : +- *(1) Range (0, 100, step=1, splits=8) +- *(4) Sort [id#3L ASC NULLS FIRST], false, 0 +- ReusedExchange [id#3L], Exchange hashpartitioning(id#0L, 200) ``` 2. Flipped the default value of the new conf option "spark.sql.codegen.wholeStage.useIdInClassName" to true. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/211/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86612 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86612/testReport)** for PR 20224 at commit [`e449216`](https://github.com/apache/spark/commit/e449216392402510444ecd002c22e884bcbed2fc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 BTW, inspired by @cloud-fan 's comment, here's an example of the codegen stage IDs when scalar subqueries are involved: ```scala val sub = "(select sum(id) from range(5))" val df = spark.sql(s"select $sub as a, $sub as b") df.explain(true) ``` would give: ``` == Parsed Logical Plan == 'Project [scalar-subquery#0 [] AS a#1, scalar-subquery#2 [] AS b#3] : :- 'Project [unresolvedalias('sum('id), None)] : : +- 'UnresolvedTableValuedFunction range, [5] : +- 'Project [unresolvedalias('sum('id), None)] : +- 'UnresolvedTableValuedFunction range, [5] +- OneRowRelation == Analyzed Logical Plan == a: bigint, b: bigint Project [scalar-subquery#0 [] AS a#1L, scalar-subquery#2 [] AS b#3L] : :- Aggregate [sum(id#14L) AS sum(id)#16L] : : +- Range (0, 5, step=1, splits=None) : +- Aggregate [sum(id#17L) AS sum(id)#19L] : +- Range (0, 5, step=1, splits=None) +- OneRowRelation == Optimized Logical Plan == Project [scalar-subquery#0 [] AS a#1L, scalar-subquery#2 [] AS b#3L] : :- Aggregate [sum(id#14L) AS sum(id)#16L] : : +- Range (0, 5, step=1, splits=None) : +- Aggregate [sum(id#17L) AS sum(id)#19L] : +- Range (0, 5, step=1, splits=None) +- OneRowRelation == Physical Plan == *(1) Project [Subquery subquery0 AS a#1L, Subquery subquery2 AS b#3L] : :- Subquery subquery0 : : +- *(2) HashAggregate(keys=[], functions=[sum(id#14L)], output=[sum(id)#16L]) : : +- Exchange SinglePartition : :+- *(1) HashAggregate(keys=[], functions=[partial_sum(id#14L)], output=[sum#21L]) : : +- *(1) Range (0, 5, step=1, splits=8) : +- Subquery subquery2 : +- *(2) HashAggregate(keys=[], functions=[sum(id#17L)], output=[sum(id)#19L]) :+- Exchange SinglePartition : +- *(1) HashAggregate(keys=[], functions=[partial_sum(id#17L)], output=[sum#23L]) : +- *(1) Range (0, 5, step=1, splits=8) +- Scan OneRowRelation[] ``` The reason why the IDs look a bit "odd" (that there are three separate codegen stages with ID 1) is because the main "spine" query and each individual subqueries are "planned" separately, thus they'd run `CollapseCodegenStages` separately, each counting up from 1 afresh. I would consider this behavior acceptable, but I wonder what others would think in this case. If this behavior for subqueries is not acceptable, I'll have to find alternative places to put the initialization and reset of the thread-local ID counter. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 also ping @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86521/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86521/testReport)** for PR 20224 at commit [`a7ceda2`](https://github.com/apache/spark/commit/a7ceda298f776bc195b0d2fbf447d886ca5af63e). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `.doc(\"When true, embed the codegen stage ID into the class name of the generated class\")` * ` final class $className extends $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86521/testReport)** for PR 20224 at commit [`a7ceda2`](https://github.com/apache/spark/commit/a7ceda298f776bc195b0d2fbf447d886ca5af63e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/133/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86515/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86515/testReport)** for PR 20224 at commit [`a0162aa`](https://github.com/apache/spark/commit/a0162aacb6e6e88057819e878fc2ddd7ed9ceb91). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `.doc(\"When true, embed the codegen stage ID into the class name of the generated class\")` * ` final class $className extends $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86516/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86516/testReport)** for PR 20224 at commit [`a7ceda2`](https://github.com/apache/spark/commit/a7ceda298f776bc195b0d2fbf447d886ca5af63e). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `.doc(\"When true, embed the codegen stage ID into the class name of the generated class\")` * ` final class $className extends $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86516 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86516/testReport)** for PR 20224 at commit [`a7ceda2`](https://github.com/apache/spark/commit/a7ceda298f776bc195b0d2fbf447d886ca5af63e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/128/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #86515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86515/testReport)** for PR 20224 at commit [`a0162aa`](https://github.com/apache/spark/commit/a0162aacb6e6e88057819e878fc2ddd7ed9ceb91). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/127/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 I've updated the PR addressing @gatorsmile 's comments: moved the new utility code to `WholeStageCodegenId` object and added a new test case in `HiveExplainSuite`. ping @gatorsmile @kiszk @maropu @viirya to have a second look. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 Thanks @gatorsmile ! Will add a new test case in `HiveExplainSuite`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20224 Overall, the proposal looks good to me. We need a test case in `HiveExplainSuite` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20224 As high level comment, to add IDs helps performance/error diagnosis in production environments. I strongly support to always enable this. Let me look at technical detail later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 Thanks for your comments, @viirya ! I'd say only having (1) and (2) makes it much less useful than having all 3, but it's still useful in its own for helping people understand exactly which physical operators were fused into a single codegen stage (as opposed to assuming adjacent codegen'd operators are always in the same codegen stage). The `SortMergeJoin` case was something that I really wished we had such an ID readily available in the explain output. I had learned the hacky implementation of SMJ the hard way... With (3) and the new proposal of reserving `references[0]` for the codegenStageId, I'm sure it'll be useful for some of your use cases (especially codegen-related development), too. Do you have any use cases off the top of your head, or any suggestions as to whether or not such an ID makes sense in general? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20224 > Would you (@kiszk and @maropu ) agree that at least having both (1) and (2) is a good idea? Without (3), is this still useful if we only have (1) and (2)? It may not much useful if only having the codegen id in explain output. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 Thanks for your comments and questions, @kiszk and @maropu ! Let me address them in a couple of separate points. **tl;dr** On top of my original proposal in the PR description / JIRA ticket, I'd like to further add: a. A config option to choose whether or not to include the `codegenStageId` in the generated class name. The default should be "off" meaning not including the ID in the class name. b. To reserve the `[0]` element in the `references` array of the WSC generated class as a special value, to record the codegen stage ID. That way, let's say if we need to throw an exception from the generated code, we can include the codegen stage ID when constructing the exception message string. This doesn't add any new IDs to the generated code, so @kiszk 's concerns on codegen cache can be addressed. This can be always turned on. Side note: even if we only embed the codegen stage ID into comments, because of the way the codegen cache uses `CodeAndComment` as the key, differences in comments will still affect the cache hit effectiveness. On the other hand, putting the codegen stage ID into the `references` array solves this problems perfectly -- this array is Spark SQL's way of expression a "runtime constant pool" anyway. This idea is somewhat similar to how HotSpot VM's "LambdaForm bytecode sharing" works. **Detail Discussions** My proposal and PR currently does 3 things: 1. Add a per-query `codegenStageId` to `WholeStageCodegenExec`; 2. Include the ID as a part of the explain output for physical plans; 3. Include the ID as a part of the generated class name for WSC. Of the above, (1) is the fundamentals, while (2) and (3) are separate applications of using the information from (1). Would you (@kiszk and @maropu ) agree that at least having both (1) and (2) is a good idea? They don't interact with anything else at runtime, so there so behavioral change or performance implications because of them. They can be always turned on with minimal overhead. @rxin did point out that our current explain output for physical plans is already pretty cluttered and not user-friendly enough, so it makes sense to have a "verbose mode" in the future and then make the default mode less cluttered. But that's out of scope for this change. For (3), @kiszk does point out that there's an interaction between the generated code (in source string + comments form) and the codegen cache (from `CodeAndComment` -> generated class, in `org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator#cache`). I do know about this cache and have put in considerations of this interaction when I sketched out this proposal. This PR proposes an ID unique within a query. If the same query is run multiple times, it'll generate the exact same code (with the IDs included), so at least with the current implementation, we can guarantee that there won't be redundant compilation for multiple runs of the same query. I mentioned this in the PR description: > The reason why this proposal uses a per-query ID is because it's stable within a query, so that multiple runs of the same query will see the same resulting IDs. This both benefits understandability for users, and also it plays well with the codegen cache in Spark SQL which uses the generated source code as the key. This kind of codegen cache hit is fundamental, and this PR keeps it working. Within a query, though, before this change there could have been cases where there can be codegen stages that happens to have the exact same source code, thus would work well with the codegen cache. After this change, such cases would end up generating code with different IDs embedded into the class name so they'll have different source code, thus won't hit the codegen cache and would have to be compiled separately. Here's an example that would hit this case: ``` spark.conf.set("spark.sql.autoBroadcastJoinThreshold", 1) val df1 = spark.range(5).select('id % 2 as 'x) val df2 = spark.range(5).select('id % 2 as 'y) val query = df1.join(df2, 'x === 'y) ``` With this change, you can see the different codegen stages as follows: ``` scala> query.explain == Physical Plan == *(5) SortMergeJoin [x#3L], [y#9L], Inner :- *(2) Sort [x#3L ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(x#3L, 200) : +- *(1) Project [(id#0L % 2) AS x#3L] :+- *(1) Filter isnotnull((id#0L % 2)) : +- *(1) Range (0, 5, step=1, splits=8) +- *(4) Sort [y#9L ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(y#9L, 200) +- *(3) Project [(id#6L % 2) AS y#9L] +- *(3) Filter isnotnull((id#6L % 2)) +- *(3) Range (0, 5, step=1, splits=8)
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20224 We always need to turn on this? It seems this is debug info for developers? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20224 I totally agree to add unique ID. This is because all of the generated code by whole-stage codegen has the same class name. It makes us hard to debug in a production environment. On the other hand, IIUC, the current implementation disables caching mechanism regarding the same query. To add an unique ID generated different string for Java code. I am thinking about adding an ID related to a task into a comment or other parts. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85940/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #85940 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85940/testReport)** for PR 20224 at commit [`fa25f72`](https://github.com/apache/spark/commit/fa25f7286120df7a52ec04e851d44cdcaa41c03c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` final class $generatedClassName extends $` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 One comment as to using `ThreadLocal[Integer]` for keeping track of the IDs: I did have an alternative implementation of this PR that declares `WholeStageCodegenExec` as: ```scala case class WholeStageCodegenExec(child: SparkPlan)(private val codegenStageId: Int) extends UnaryExecNode with CodegenSupport ``` and then explicitly thread the `codegenStageId` recursively in `CollapseCodegenStages.insertWholeStageCodegen()`, so that the relationship between the auto-increment of IDs and the insertion order of `WholeStageCodegenExec`s are explicit. However that turned out to be much more complicated than just using a `ThreadLocal[Integer]` and implicitly threading the IDs. So in the end I opted for the thread-local counter version instead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20224 **[Test build #85940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85940/testReport)** for PR 20224 at commit [`fa25f72`](https://github.com/apache/spark/commit/fa25f7286120df7a52ec04e851d44cdcaa41c03c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85938/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20224 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org