[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86660/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20224
  
Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86660 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86660/testReport)**
 for PR 20224 at commit 
[`fd8983e`](https://github.com/apache/spark/commit/fd8983edaee5fe9a7968de45a174d6781128296e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: 
Int)`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86660 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86660/testReport)**
 for PR 20224 at commit 
[`fd8983e`](https://github.com/apache/spark/commit/fd8983edaee5fe9a7968de45a174d6781128296e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/247/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86655 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86655/testReport)**
 for PR 20224 at commit 
[`b2e2cb0`](https://github.com/apache/spark/commit/b2e2cb0db7c3e0dadf84dd4a1b5d2fbcbae5394f).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: 
Int)`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86655/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86655 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86655/testReport)**
 for PR 20224 at commit 
[`b2e2cb0`](https://github.com/apache/spark/commit/b2e2cb0db7c3e0dadf84dd4a1b5d2fbcbae5394f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/244/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86638/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86638 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86638/testReport)**
 for PR 20224 at commit 
[`ce8171a`](https://github.com/apache/spark/commit/ce8171aa115d05d0a198964a64e7ef60d9637502).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: 
Int)`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86636/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86636 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86636/testReport)**
 for PR 20224 at commit 
[`5c99777`](https://github.com/apache/spark/commit/5c99777a6a9aa21905158d53f9e393c0ad7acc9f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: 
Int)`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86632/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86631/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86632 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86632/testReport)**
 for PR 20224 at commit 
[`a11232e`](https://github.com/apache/spark/commit/a11232e162c50a1b9312410debb9fb7c4766f9a2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: 
Int)`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86631/testReport)**
 for PR 20224 at commit 
[`a11232e`](https://github.com/apache/spark/commit/a11232e162c50a1b9312410debb9fb7c4766f9a2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: 
Int)`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20224
  
Updated again to address @cloud-fan 's comments: removed unneeded test case 
and added a bit more comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86638 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86638/testReport)**
 for PR 20224 at commit 
[`ce8171a`](https://github.com/apache/spark/commit/ce8171aa115d05d0a198964a64e7ef60d9637502).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/229/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86636 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86636/testReport)**
 for PR 20224 at commit 
[`5c99777`](https://github.com/apache/spark/commit/5c99777a6a9aa21905158d53f9e393c0ad7acc9f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/227/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20224
  
Updated again. Addressed @viirya 's comments:
1. added comments to explain where this codegen stage ID is used
2. moved an assertion message to a comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86632 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86632/testReport)**
 for PR 20224 at commit 
[`a11232e`](https://github.com/apache/spark/commit/a11232e162c50a1b9312410debb9fb7c4766f9a2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/224/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20224
  
LGTM with minor comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20224
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86631/testReport)**
 for PR 20224 at commit 
[`a11232e`](https://github.com/apache/spark/commit/a11232e162c50a1b9312410debb9fb7c4766f9a2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86627 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86627/testReport)**
 for PR 20224 at commit 
[`a11232e`](https://github.com/apache/spark/commit/a11232e162c50a1b9312410debb9fb7c4766f9a2).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: 
Int)`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86627/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20224
  
LGTM, pending jenkins


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86627 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86627/testReport)**
 for PR 20224 at commit 
[`a11232e`](https://github.com/apache/spark/commit/a11232e162c50a1b9312410debb9fb7c4766f9a2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/222/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20224
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86612/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86612 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86612/testReport)**
 for PR 20224 at commit 
[`e449216`](https://github.com/apache/spark/commit/e449216392402510444ecd002c22e884bcbed2fc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `.doc(\"When true, embed the codegen stage ID into the class name 
of the generated class\")`
  * `case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: 
Int)`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-24 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20224
  
Updated the PR:

1. addressed @cloud-fan 's comment to make sure the `codegenStageId` is 
properly copied in transformations after `CollapseCodegenStages`. Added a new 
unit test case for it.

The test case triggers `ReuseExchange`, which is a rule that runs after 
`CollapseCodegenStages`.
Before this update, the explain output for the test query is:
```
== Physical Plan ==
*(0) Project [id#7L]
+- *(0) SortMergeJoin [id#7L], [id#10L], Inner
   :- *(2) Sort [id#7L ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(id#7L, 200)
   : +- *(1) Range (0, 100, step=1, splits=8)
   +- *(0) Sort [id#10L ASC NULLS FIRST], false, 0
  +- ReusedExchange [id#10L], Exchange hashpartitioning(id#7L, 200)
```
Note the `*(0)`s are indicating that the `codegenStageId`s are not properly 
copied. After this update, it is now:
```
== Physical Plan ==
*(5) Project [id#0L]
+- *(5) SortMergeJoin [id#0L], [id#3L], Inner
   :- *(2) Sort [id#0L ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(id#0L, 200)
   : +- *(1) Range (0, 100, step=1, splits=8)
   +- *(4) Sort [id#3L ASC NULLS FIRST], false, 0
  +- ReusedExchange [id#3L], Exchange hashpartitioning(id#0L, 200)
```

2. Flipped the default value of the new conf option 
"spark.sql.codegen.wholeStage.useIdInClassName" to true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/211/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86612 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86612/testReport)**
 for PR 20224 at commit 
[`e449216`](https://github.com/apache/spark/commit/e449216392402510444ecd002c22e884bcbed2fc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20224
  
BTW, inspired by @cloud-fan 's comment, here's an example of the codegen 
stage IDs when scalar subqueries are involved:

```scala
val sub = "(select sum(id) from range(5))"
val df = spark.sql(s"select $sub as a, $sub as b")
df.explain(true)
```
would give:
```
== Parsed Logical Plan ==
'Project [scalar-subquery#0 [] AS a#1, scalar-subquery#2 [] AS b#3]
:  :- 'Project [unresolvedalias('sum('id), None)]
:  :  +- 'UnresolvedTableValuedFunction range, [5]
:  +- 'Project [unresolvedalias('sum('id), None)]
: +- 'UnresolvedTableValuedFunction range, [5]
+- OneRowRelation

== Analyzed Logical Plan ==
a: bigint, b: bigint
Project [scalar-subquery#0 [] AS a#1L, scalar-subquery#2 [] AS b#3L]
:  :- Aggregate [sum(id#14L) AS sum(id)#16L]
:  :  +- Range (0, 5, step=1, splits=None)
:  +- Aggregate [sum(id#17L) AS sum(id)#19L]
: +- Range (0, 5, step=1, splits=None)
+- OneRowRelation

== Optimized Logical Plan ==
Project [scalar-subquery#0 [] AS a#1L, scalar-subquery#2 [] AS b#3L]
:  :- Aggregate [sum(id#14L) AS sum(id)#16L]
:  :  +- Range (0, 5, step=1, splits=None)
:  +- Aggregate [sum(id#17L) AS sum(id)#19L]
: +- Range (0, 5, step=1, splits=None)
+- OneRowRelation

== Physical Plan ==
*(1) Project [Subquery subquery0 AS a#1L, Subquery subquery2 AS b#3L]
:  :- Subquery subquery0
:  :  +- *(2) HashAggregate(keys=[], functions=[sum(id#14L)], 
output=[sum(id)#16L])
:  : +- Exchange SinglePartition
:  :+- *(1) HashAggregate(keys=[], functions=[partial_sum(id#14L)], 
output=[sum#21L])
:  :   +- *(1) Range (0, 5, step=1, splits=8)
:  +- Subquery subquery2
: +- *(2) HashAggregate(keys=[], functions=[sum(id#17L)], 
output=[sum(id)#19L])
:+- Exchange SinglePartition
:   +- *(1) HashAggregate(keys=[], functions=[partial_sum(id#17L)], 
output=[sum#23L])
:  +- *(1) Range (0, 5, step=1, splits=8)
+- Scan OneRowRelation[]
```

The reason why the IDs look a bit "odd" (that there are three separate 
codegen stages with ID 1) is because the main "spine" query and each individual 
subqueries are "planned" separately, thus they'd run `CollapseCodegenStages` 
separately, each counting up from 1 afresh. I would consider this behavior 
acceptable, but I wonder what others would think in this case.
If this behavior for subqueries is not acceptable, I'll have to find 
alternative places to put the initialization and reset of the thread-local ID 
counter.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20224
  
also ping @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86521/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86521 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86521/testReport)**
 for PR 20224 at commit 
[`a7ceda2`](https://github.com/apache/spark/commit/a7ceda298f776bc195b0d2fbf447d886ca5af63e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `.doc(\"When true, embed the codegen stage ID into the class name 
of the generated class\")`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86521 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86521/testReport)**
 for PR 20224 at commit 
[`a7ceda2`](https://github.com/apache/spark/commit/a7ceda298f776bc195b0d2fbf447d886ca5af63e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/133/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20224
  
jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86515/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86515 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86515/testReport)**
 for PR 20224 at commit 
[`a0162aa`](https://github.com/apache/spark/commit/a0162aacb6e6e88057819e878fc2ddd7ed9ceb91).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `.doc(\"When true, embed the codegen stage ID into the class name 
of the generated class\")`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86516/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86516 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86516/testReport)**
 for PR 20224 at commit 
[`a7ceda2`](https://github.com/apache/spark/commit/a7ceda298f776bc195b0d2fbf447d886ca5af63e).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `.doc(\"When true, embed the codegen stage ID into the class name 
of the generated class\")`
  * `  final class $className extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86516 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86516/testReport)**
 for PR 20224 at commit 
[`a7ceda2`](https://github.com/apache/spark/commit/a7ceda298f776bc195b0d2fbf447d886ca5af63e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/128/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #86515 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86515/testReport)**
 for PR 20224 at commit 
[`a0162aa`](https://github.com/apache/spark/commit/a0162aacb6e6e88057819e878fc2ddd7ed9ceb91).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/127/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-22 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20224
  
I've updated the PR addressing @gatorsmile 's comments: moved the new 
utility code to `WholeStageCodegenId` object and added a new test case in 
`HiveExplainSuite`.

ping @gatorsmile @kiszk @maropu @viirya to have a second look. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-17 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20224
  
Thanks @gatorsmile ! Will add a new test case in `HiveExplainSuite`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20224
  
Overall, the proposal looks good to me. We need a test case in 
`HiveExplainSuite`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-11 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20224
  
As high level comment, to add IDs helps performance/error diagnosis in 
production environments. I strongly support to always enable this.
Let me look at technical detail later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-11 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20224
  
Thanks for your comments, @viirya !

I'd say only having (1) and (2) makes it much less useful than having all 
3, but it's still useful in its own for helping people understand exactly which 
physical operators were fused into a single codegen stage (as opposed to 
assuming adjacent codegen'd operators are always in the same codegen stage).
The `SortMergeJoin` case was something that I really wished we had such an 
ID readily available in the explain output. I had learned the hacky 
implementation of SMJ the hard way...

With (3) and the new proposal of reserving `references[0]` for the 
codegenStageId, I'm sure it'll be useful for some of your use cases (especially 
codegen-related development), too. Do you have any use cases off the top of 
your head, or any suggestions as to whether or not such an ID makes sense in 
general?

Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-11 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20224
  
> Would you (@kiszk and @maropu ) agree that at least having both (1) and 
(2) is a good idea? 

Without (3), is this still useful if we only have (1) and (2)? It may not 
much useful if only having the codegen id in explain output.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-11 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20224
  
Thanks for your comments and questions, @kiszk and @maropu !
Let me address them in a couple of separate points.

**tl;dr**

On top of my original proposal in the PR description / JIRA ticket, I'd 
like to further add:
a. A config option to choose whether or not to include the `codegenStageId` 
in the generated class name. The default should be "off" meaning not including 
the ID in the class name.
b. To reserve the `[0]` element in the `references` array of the WSC 
generated class as a special value, to record the codegen stage ID. That way, 
let's say if we need to throw an exception from the generated code, we can 
include the codegen stage ID when constructing the exception message string. 
This doesn't add any new IDs to the generated code, so @kiszk 's concerns on 
codegen cache can be addressed. This can be always turned on.

Side note: even if we only embed the codegen stage ID into comments, 
because of the way the codegen cache uses `CodeAndComment` as the key, 
differences in comments will still affect the cache hit effectiveness. On the 
other hand, putting the codegen stage ID into the `references` array solves 
this problems perfectly -- this array is Spark SQL's way of expression a 
"runtime constant pool" anyway. This idea is somewhat similar to how HotSpot 
VM's "LambdaForm bytecode sharing" works.

**Detail Discussions** 

My proposal and PR currently does 3 things:
1. Add a per-query `codegenStageId` to `WholeStageCodegenExec`;
2. Include the ID as a part of the explain output for physical plans;
3. Include the ID as a part of the generated class name for WSC.

Of the above, (1) is the fundamentals, while (2) and (3) are separate 
applications of using the information from (1).

Would you (@kiszk and @maropu ) agree that at least having both (1) and (2) 
is a good idea? They don't interact with anything else at runtime, so there so 
behavioral change or performance implications because of them. They can be 
always turned on with minimal overhead.

@rxin did point out that our current explain output for physical plans is 
already pretty cluttered and not user-friendly enough, so it makes sense to 
have a "verbose mode" in the future and then make the default mode less 
cluttered. But that's out of scope for this change.

For (3), @kiszk does point out that there's an interaction between the 
generated code (in source string + comments form) and the codegen cache (from 
`CodeAndComment` -> generated class, in 
`org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator#cache`). I do 
know about this cache and have put in considerations of this interaction when I 
sketched out this proposal.

This PR proposes an ID unique within a query. If the same query is run 
multiple times, it'll generate the exact same code (with the IDs included), so 
at least with the current implementation, we can guarantee that there won't be 
redundant compilation for multiple runs of the same query. I mentioned this in 
the PR description:
> The reason why this proposal uses a per-query ID is because it's stable 
within a query, so that multiple runs of the same query will see the same 
resulting IDs. This both benefits understandability for users, and also it 
plays well with the codegen cache in Spark SQL which uses the generated source 
code as the key.
This kind of codegen cache hit is fundamental, and this PR keeps it working.

Within a query, though, before this change there could have been cases 
where there can be codegen stages that happens to have the exact same source 
code, thus would work well with the codegen cache. After this change, such 
cases would end up generating code with different IDs embedded into the class 
name so they'll have different source code, thus won't hit the codegen cache 
and would have to be compiled separately.

Here's an example that would hit this case:
```
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", 1)
val df1 = spark.range(5).select('id % 2 as 'x)
val df2 = spark.range(5).select('id % 2 as 'y)
val query = df1.join(df2, 'x === 'y)
```
With this change, you can see the different codegen stages as follows:
```
scala> query.explain
== Physical Plan ==
*(5) SortMergeJoin [x#3L], [y#9L], Inner
:- *(2) Sort [x#3L ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(x#3L, 200)
: +- *(1) Project [(id#0L % 2) AS x#3L]
:+- *(1) Filter isnotnull((id#0L % 2))
:   +- *(1) Range (0, 5, step=1, splits=8)
+- *(4) Sort [y#9L ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(y#9L, 200)
  +- *(3) Project [(id#6L % 2) AS y#9L]
 +- *(3) Filter isnotnull((id#6L % 2))
+- *(3) Range (0, 5, step=1, splits=8)
   

[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-10 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20224
  
We always need to turn on this? It seems this is debug info for developers?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20224
  
I totally agree to add unique ID. This is because all of the generated code 
by whole-stage codegen has the same class name. It makes us hard to debug in a 
production environment.

On the other hand, IIUC, the current implementation disables caching 
mechanism regarding the same query. To add an unique ID generated different 
string for Java code.

I am thinking about adding an ID related to a task into a comment or other 
parts.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85940/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #85940 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85940/testReport)**
 for PR 20224 at commit 
[`fa25f72`](https://github.com/apache/spark/commit/fa25f7286120df7a52ec04e851d44cdcaa41c03c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  final class $generatedClassName extends $`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-10 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20224
  
One comment as to using `ThreadLocal[Integer]` for keeping track of the 
IDs: I did have an alternative implementation of this PR that declares 
`WholeStageCodegenExec` as:
```scala
case class WholeStageCodegenExec(child: SparkPlan)(private val 
codegenStageId: Int)
extends UnaryExecNode with CodegenSupport
```
and then explicitly thread the `codegenStageId` recursively in 
`CollapseCodegenStages.insertWholeStageCodegen()`, so that the relationship 
between the auto-increment of IDs and the insertion order of 
`WholeStageCodegenExec`s are explicit.

However that turned out to be much more complicated than just using a 
`ThreadLocal[Integer]` and implicitly threading the IDs. So in the end I opted 
for the thread-local counter version instead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20224
  
**[Test build #85940 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85940/testReport)**
 for PR 20224 at commit 
[`fa25f72`](https://github.com/apache/spark/commit/fa25f7286120df7a52ec04e851d44cdcaa41c03c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-10 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/20224
  
jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85938/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20224
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org