[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11391


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-03-01 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190803769
  
LGTM, merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190721673
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190721677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52237/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-03-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190721018
  
**[Test build #52237 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52237/consoleFull)**
 for PR 11391 at commit 
[`3d1e397`](https://github.com/apache/spark/commit/3d1e3972d65ef17e02c21affab97346ebafc156b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-03-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190681436
  
**[Test build #52237 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52237/consoleFull)**
 for PR 11391 at commit 
[`3d1e397`](https://github.com/apache/spark/commit/3d1e3972d65ef17e02c21affab97346ebafc156b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190613392
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/5/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190613374
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-03-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190612590
  
**[Test build #5 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/5/consoleFull)**
 for PR 11391 at commit 
[`b64e52d`](https://github.com/apache/spark/commit/b64e52d189e5041cc1af2ebb0d656c5f5c12c82d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190608352
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190608354
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52219/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-03-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190608007
  
**[Test build #52219 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52219/consoleFull)**
 for PR 11391 at commit 
[`8d254d2`](https://github.com/apache/spark/commit/8d254d206686dd9c6edd053d4abcd184799fcc2a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54529927
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -78,4 +78,21 @@ class WholeStageCodegenSuite extends SparkPlanTest with 
SharedSQLContext {
 
p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort]).isDefined)
 assert(df.collect() === Array(Row(1), Row(2), Row(3)))
   }
+
+  test("Limit should be included in WholeStageCodegen") {
+val df = sqlContext.range(1).limit(100).sort(col("id"))
+val plan = df.queryExecution.executedPlan
+
+assert(plan.find(p =>
+  p.isInstanceOf[WholeStageCodegen] &&
+p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort] &&
--- End diff --

Agreed. Let me remove this later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54529803
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -78,4 +78,21 @@ class WholeStageCodegenSuite extends SparkPlanTest with 
SharedSQLContext {
 
p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort]).isDefined)
 assert(df.collect() === Array(Row(1), Row(2), Row(3)))
   }
+
+  test("Limit should be included in WholeStageCodegen") {
+val df = sqlContext.range(1).limit(100).sort(col("id"))
+val plan = df.queryExecution.executedPlan
+
+assert(plan.find(p =>
+  p.isInstanceOf[WholeStageCodegen] &&
+p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort] &&
--- End diff --

These kind of tests are easy to break, we may don't need this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54529510
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -78,4 +78,21 @@ class WholeStageCodegenSuite extends SparkPlanTest with 
SharedSQLContext {
 
p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort]).isDefined)
 assert(df.collect() === Array(Row(1), Row(2), Row(3)))
   }
+
+  test("Limit should be included in WholeStageCodegen") {
+val df = sqlContext.range(1).limit(100).sort(col("id"))
+val plan = df.queryExecution.executedPlan
+
+assert(plan.find(p =>
+  p.isInstanceOf[WholeStageCodegen] &&
+p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort] &&
--- End diff --

Yeah, because we can't leave limit as last operator otherwise it will 
transform to collect limit, so I add a sort here. I will remove it once I am 
back to laptop (few hours later).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190575061
  
**[Test build #5 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/5/consoleFull)**
 for PR 11391 at commit 
[`b64e52d`](https://github.com/apache/spark/commit/b64e52d189e5041cc1af2ebb0d656c5f5c12c82d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54528196
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -78,4 +78,21 @@ class WholeStageCodegenSuite extends SparkPlanTest with 
SharedSQLContext {
 
p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort]).isDefined)
 assert(df.collect() === Array(Row(1), Row(2), Row(3)))
   }
+
+  test("Limit should be included in WholeStageCodegen") {
+val df = sqlContext.range(1).limit(100).sort(col("id"))
+val plan = df.queryExecution.executedPlan
+
+assert(plan.find(p =>
+  p.isInstanceOf[WholeStageCodegen] &&
+p.asInstanceOf[WholeStageCodegen].plan.isInstanceOf[Sort] &&
--- End diff --

The sort is not related to limit, could you remove it from this PR? (we may 
revert the commit for sort)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190567560
  
**[Test build #52219 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52219/consoleFull)**
 for PR 11391 at commit 
[`8d254d2`](https://github.com/apache/spark/commit/8d254d206686dd9c6edd053d4abcd184799fcc2a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54526775
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java 
---
@@ -35,6 +35,8 @@
   // used when there is no column in output
   protected UnsafeRow unsafeRow = new UnsafeRow(0);
 
+  protected boolean stopEarly = false;
--- End diff --

yeah. I am updating it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54525690
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java 
---
@@ -35,6 +35,8 @@
   // used when there is no column in output
   protected UnsafeRow unsafeRow = new UnsafeRow(0);
 
+  protected boolean stopEarly = false;
--- End diff --

We could use `addMutableState`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54525659
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java 
---
@@ -35,6 +35,8 @@
   // used when there is no column in output
   protected UnsafeRow unsafeRow = new UnsafeRow(0);
 
+  protected boolean stopEarly = false;
--- End diff --

Since `stopEarly` is only accessed generated functions, we don't need this 
anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190551990
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190551991
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52213/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190551827
  
**[Test build #52213 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52213/consoleFull)**
 for PR 11391 at commit 
[`c887cf4`](https://github.com/apache/spark/commit/c887cf47a36da8d34e33afeca273f415df629fbb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190526392
  
**[Test build #52213 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52213/consoleFull)**
 for PR 11391 at commit 
[`c887cf4`](https://github.com/apache/spark/commit/c887cf47a36da8d34e33afeca273f415df629fbb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54517293
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java
 ---
@@ -765,6 +765,9 @@ private void readIntBatch(int rowId, int num, 
ColumnVector column) throws IOExce
   } else if (DecimalType.is64BitDecimalType(column.dataType())) {
 defColumn.readIntsAsLongs(
 num, column, rowId, maxDefLevel, (VectorizedValuesReader) 
dataColumn);
+  } else if (column.dataType() == DataTypes.ShortType) {
--- End diff --

Ah, it has been merged, let me sync with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54451555
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java 
---
@@ -64,7 +66,16 @@ protected void append(InternalRow row) {
* If it returns true, the caller should exit the loop (return from 
processNext()).
*/
   protected boolean shouldStop() {
-return !currentRows.isEmpty();
+return !currentRows.isEmpty() || stopEarly;
--- End diff --

I think we could only override `shouldStop()` for Limit, so we won't add a 
branch when we don't have Limit.

```
ctx.addFunction("shouldStop", """
  protected boolean shouldStop() {
return !currentRows.isEmpty() || stopEarly;
 }""")
```

Please be care full that we can't have two `shouldStop()` in generated 
class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54450500
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala
 ---
@@ -70,6 +70,20 @@ class BenchmarkWholeStageCodegen extends SparkFunSuite {
 */
   }
 
+  ignore("range/limit/sum") {
+val N = 500 << 20
+runBenchmark("range/limit/sum", N) {
+  sqlContext.range(N).limit(100).groupBy().sum().collect()
--- End diff --

For 100 rows, we may can't really see the performance difference, could you 
increase it to 1 million?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54450174
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -56,6 +57,28 @@ trait BaseLimit extends UnaryNode {
   protected override def doExecute(): RDD[InternalRow] = 
child.execute().mapPartitions { iter =>
 iter.take(limit)
   }
+
+  override def upstreams(): Seq[RDD[InternalRow]] = {
+child.asInstanceOf[CodegenSupport].upstreams()
+  }
+
+  protected override def doProduce(ctx: CodegenContext): String = {
+child.asInstanceOf[CodegenSupport].produce(ctx, this)
+  }
+
+  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): 
String = {
+val countTerm = ctx.freshName("count")
+ctx.addMutableState("int", countTerm, s"$countTerm = 0;")
+ctx.currentVars = input
--- End diff --

We don't need to update `currentVars` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54449722
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java
 ---
@@ -765,6 +765,9 @@ private void readIntBatch(int rowId, int num, 
ColumnVector column) throws IOExce
   } else if (DecimalType.is64BitDecimalType(column.dataType())) {
 defColumn.readIntsAsLongs(
 num, column, rowId, maxDefLevel, (VectorizedValuesReader) 
dataColumn);
+  } else if (column.dataType() == DataTypes.ShortType) {
--- End diff --

Could you pull this out as an separate PR? so we can merge it quickly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-29 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-190200340
  
cc @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-26 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-189244923
  
Mark it as WIP?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-26 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/11391#discussion_r54235113
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -43,12 +44,33 @@ case class CollectLimit(limit: Int, child: SparkPlan) 
extends UnaryNode {
 child.execute(), child.output, SinglePartition, serializer))
 shuffled.mapPartitionsInternal(_.take(limit))
   }
+
+  override def upstreams(): Seq[RDD[InternalRow]] = {
+child.asInstanceOf[CodegenSupport].upstreams()
+  }
+
+  protected override def doProduce(ctx: CodegenContext): String = {
+child.asInstanceOf[CodegenSupport].produce(ctx, this)
+  }
+
+  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): 
String = {
+val countTerm = ctx.freshName("count")
+ctx.addMutableState("int", countTerm, s"$countTerm = 0;")
+ctx.currentVars = input
+s"""
+   | while ($countTerm < $limit) {
+   |   $countTerm += 1;
+   |   ${consume(ctx, ctx.currentVars)}
+   | }
+   | if (true) return;
--- End diff --

This is special?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-26 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-189231072
  
Just realized this implementation is wrong. Will fix it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-189223865
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-189223746
  
**[Test build #52046 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52046/consoleFull)**
 for PR 11391 at commit 
[`a213ed1`](https://github.com/apache/spark/commit/a213ed141fe313ef43de0722d52966b5eda1a8ce).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CollectLimit(limit: Int, child: SparkPlan) extends 
UnaryNode with CodegenSupport `
  * `trait BaseLimit extends UnaryNode with CodegenSupport `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-189223869
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52046/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11391#issuecomment-189209869
  
**[Test build #52046 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52046/consoleFull)**
 for PR 11391 at commit 
[`a213ed1`](https://github.com/apache/spark/commit/a213ed141fe313ef43de0722d52966b5eda1a8ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13511][SQL] Add wholestage codegen for ...

2016-02-26 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/11391

[SPARK-13511][SQL] Add wholestage codegen for limit

JIRA: https://issues.apache.org/jira/browse/SPARK-13511

## What changes were proposed in this pull request?

Current limit operator doesn't support wholestage codegen. This is open to 
add support for it.

## How was this patch tested?

A test is added into BenchmarkWholeStageCodegen.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 wholestage-limit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11391.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11391


commit a213ed141fe313ef43de0722d52966b5eda1a8ce
Author: Liang-Chi Hsieh 
Date:   2016-02-26T10:27:01Z

Add wholestage codegen for limit.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org