[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2016-01-11 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-170757463
  
Let's close the limit push down pull requests. We will need to design this 
more properly because it is expensive to push down large limits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2016-01-11 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-170757733
  
Sure, let me close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2016-01-11 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/10451


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167954886
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48456/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167954987
  
**[Test build #48461 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48461/consoleFull)**
 for PR 10451 at commit 
[`7cf955f`](https://github.com/apache/spark/commit/7cf955f5fd0546af85f247054a8a48e3fd54ae2e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167954885
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167972021
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48465/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167971953
  
**[Test build #48465 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48465/consoleFull)**
 for PR 10451 at commit 
[`7899312`](https://github.com/apache/spark/commit/7899312b7959a495d39d1ff392c573cdf7aa6755).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167972019
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167955427
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167955424
  
**[Test build #48461 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48461/consoleFull)**
 for PR 10451 at commit 
[`7cf955f`](https://github.com/apache/spark/commit/7cf955f5fd0546af85f247054a8a48e3fd54ae2e).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167955428
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48461/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167957490
  
**[Test build #48465 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48465/consoleFull)**
 for PR 10451 at commit 
[`7899312`](https://github.com/apache/spark/commit/7899312b7959a495d39d1ff392c573cdf7aa6755).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48558035
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,17 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Adding extra Limit below UNION ALL iff both left and right childs 
are not Limit and no Limit
+// was pushed down before. This heuristic is valid assuming there does 
not exist any Limit
+// push-down rule that is unable to infer the value of maxRows. Any 
operator that a Limit can
+// be pushed passed should override this function.
+case Limit(exp, Union(left, right))
+  if left.maxRows.isEmpty || right.maxRows.isEmpty =>
--- End diff --

Is there a reason to not check left and right separately?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167845210
  
Thanks for working on this.  I think its getting pretty close.  A few minor 
cleanups that might be nice:
 - I think we should consider pulling all the Limit rules into their own 
`LimitPushDown` rule.  The reasoning here is twofold: we can clearly comment in 
one central place the requirements with respect to implementing maxRows.  It 
will be easier to turn off if it is ever doing the wrong thing.
 - We should do a pass through and add `maxRows` to any other logical plans 
where it make sense.  Off the top of my head:
  - Filter = `child.maxRows`
  - Union = `for(leftMax <- left.maxRows; rightMax <- rightMax) yield 
Add(leftMax, rightMax)`
  - Distinct = `child.maxRows`
  - Aggregate - `child.maxRows`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167902518
  
After rethinking the `Limit` push-down rules, we are unable to push Limit 
through any operator that could change the values or the number of rows. Thus, 
so far, the eligible candidates are `Project`, `Union All` and 
`Outer/LeftOuter/RightOuter Join`. Please correct me if my understanding is not 
right. 

Feel free to let me know if the codes need an update. Thank you! 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48581020
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -80,6 +81,33 @@ abstract class Optimizer extends 
RuleExecutor[LogicalPlan] {
 object DefaultOptimizer extends Optimizer
 
 /**
+ * Pushes down Limit for reducing the amount of returned data.
+ *
+ * 1. Adding Extra Limit beneath the operations, including Union All.
+ * 2. Project is pushed through Limit in the rule ColumnPruning
+ *
+ * Any operator that a Limit can be pushed passed should override the 
maxRows function.
+ */
+object PushDownLimit extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+
+// Adding extra Limit below UNION ALL iff both left and right childs 
are not Limit or
+// do not have Limit descendants. This heuristic is valid assuming 
there does not exist
+// any Limit push-down rule that is unable to infer the value of 
maxRows.
+// Note, right now, Union means UNION ALL, which does not de-duplicate 
rows. So, it is
+// safe to pushdown Limit through it. Once we add UNION DISTINCT, we 
will not be able to
+// pushdown Limit.
+case Limit(exp, Union(left, right))
+  if left.maxRows.isEmpty || right.maxRows.isEmpty =>
--- End diff --

Yeah, you are right. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48578529
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -69,6 +73,33 @@ object DefaultOptimizer extends Optimizer {
 }
 
 /**
+ * Pushes down Limit for reducing the amount of returned data.
+ *
+ * 1. Adding Extra Limit beneath the operations, including Union All.
+ * 2. Project is pushed through Limit in the rule ColumnPruning
+ *
+ * Any operator that a Limit can be pushed passed should override the 
maxRows function.
+ *
+ * Note: This rule has to be done when the logical plan is stable;
+ *   Otherwise, it could impact the other rules.
--- End diff --

I'm not sure what this means?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48581249
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -91,6 +91,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the limited number of rows to be returned.
--- End diff --

Actually we will push down `Project` through `Limit` in `ColumnPruning`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48582039
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -80,6 +81,33 @@ abstract class Optimizer extends 
RuleExecutor[LogicalPlan] {
 object DefaultOptimizer extends Optimizer
 
 /**
+ * Pushes down Limit for reducing the amount of returned data.
+ *
+ * 1. Adding Extra Limit beneath the operations, including Union All.
+ * 2. Project is pushed through Limit in the rule ColumnPruning
+ *
+ * Any operator that a Limit can be pushed passed should override the 
maxRows function.
+ */
+object PushDownLimit extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+
+// Adding extra Limit below UNION ALL iff both left and right childs 
are not Limit or
+// do not have Limit descendants. This heuristic is valid assuming 
there does not exist
+// any Limit push-down rule that is unable to infer the value of 
maxRows.
+// Note, right now, Union means UNION ALL, which does not de-duplicate 
rows. So, it is
+// safe to pushdown Limit through it. Once we add UNION DISTINCT, we 
will not be able to
+// pushdown Limit.
+case Limit(exp, Union(left, right))
+  if left.maxRows.isEmpty || right.maxRows.isEmpty =>
--- End diff --

Yeah, that also makes sense. Will do the change after these three running 
test cases. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167916894
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48435/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167916892
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167899814
  
**[Test build #48431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48431/consoleFull)**
 for PR 10451 at commit 
[`2823a57`](https://github.com/apache/spark/commit/2823a57b14705f961e53f53c9341f42752889474).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48579121
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -69,6 +73,33 @@ object DefaultOptimizer extends Optimizer {
 }
 
 /**
+ * Pushes down Limit for reducing the amount of returned data.
+ *
+ * 1. Adding Extra Limit beneath the operations, including Union All.
+ * 2. Project is pushed through Limit in the rule ColumnPruning
+ *
+ * Any operator that a Limit can be pushed passed should override the 
maxRows function.
+ *
+ * Note: This rule has to be done when the logical plan is stable;
+ *   Otherwise, it could impact the other rules.
--- End diff --

For example, if we push `Limit` through `Filter`, `Aggregate` and 
`Distinct`, the results will be wrong. For example, `df.aggregate().limit(1)` 
and `df.limit(1).aggregate()`will generate the different results.

This is true iff we can push `Limit` through some operators. So far, we did 
not find any eligible operators except ` outer/left-outer/right-outer Join` and 
`Union`. Thus, let me revert them back. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48580661
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -80,6 +81,33 @@ abstract class Optimizer extends 
RuleExecutor[LogicalPlan] {
 object DefaultOptimizer extends Optimizer
 
 /**
+ * Pushes down Limit for reducing the amount of returned data.
+ *
+ * 1. Adding Extra Limit beneath the operations, including Union All.
+ * 2. Project is pushed through Limit in the rule ColumnPruning
+ *
+ * Any operator that a Limit can be pushed passed should override the 
maxRows function.
+ */
+object PushDownLimit extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+
+// Adding extra Limit below UNION ALL iff both left and right childs 
are not Limit or
+// do not have Limit descendants. This heuristic is valid assuming 
there does not exist
+// any Limit push-down rule that is unable to infer the value of 
maxRows.
+// Note, right now, Union means UNION ALL, which does not de-duplicate 
rows. So, it is
+// safe to pushdown Limit through it. Once we add UNION DISTINCT, we 
will not be able to
+// pushdown Limit.
+case Limit(exp, Union(left, right))
+  if left.maxRows.isEmpty || right.maxRows.isEmpty =>
--- End diff --

Okay, but why not break this into two parts.  So that we push to the left 
when the left is not limited and we push to the right when the right is not 
limited.  Now you push to both sides if either is not limited.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167916591
  
**[Test build #48435 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48435/consoleFull)**
 for PR 10451 at commit 
[`ca5c104`](https://github.com/apache/spark/commit/ca5c10467be387cc41c94018b1d7232369451102).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167900829
  
**[Test build #48433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48433/consoleFull)**
 for PR 10451 at commit 
[`cfbeea7`](https://github.com/apache/spark/commit/cfbeea767ff60f9125be2eee52c0ecc7e952d7c4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48579360
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,17 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Adding extra Limit below UNION ALL iff both left and right childs 
are not Limit and no Limit
+// was pushed down before. This heuristic is valid assuming there does 
not exist any Limit
+// push-down rule that is unable to infer the value of maxRows. Any 
operator that a Limit can
+// be pushed passed should override this function.
+case Limit(exp, Union(left, right))
+  if left.maxRows.isEmpty || right.maxRows.isEmpty =>
--- End diff --

Below is the example. If one side has a limit child/descendant, we still 
can push it down to reduce the number of returned rows. 


https://github.com/gatorsmile/spark/blob/unionLimit/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PushdownLimitsSuite.scala#L50-L57



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167903020
  
**[Test build #48435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48435/consoleFull)**
 for PR 10451 at commit 
[`ca5c104`](https://github.com/apache/spark/commit/ca5c10467be387cc41c94018b1d7232369451102).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48581133
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -80,6 +81,33 @@ abstract class Optimizer extends 
RuleExecutor[LogicalPlan] {
 object DefaultOptimizer extends Optimizer
 
 /**
+ * Pushes down Limit for reducing the amount of returned data.
+ *
+ * 1. Adding Extra Limit beneath the operations, including Union All.
+ * 2. Project is pushed through Limit in the rule ColumnPruning
+ *
+ * Any operator that a Limit can be pushed passed should override the 
maxRows function.
+ */
+object PushDownLimit extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+
+// Adding extra Limit below UNION ALL iff both left and right childs 
are not Limit or
+// do not have Limit descendants. This heuristic is valid assuming 
there does not exist
+// any Limit push-down rule that is unable to infer the value of 
maxRows.
+// Note, right now, Union means UNION ALL, which does not de-duplicate 
rows. So, it is
+// safe to pushdown Limit through it. Once we add UNION DISTINCT, we 
will not be able to
+// pushdown Limit.
+case Limit(exp, Union(left, right))
+  if left.maxRows.isEmpty || right.maxRows.isEmpty =>
--- End diff --

should we also check the limit value? If the `maxRows` is larger than the 
limit we wanna push down, seems it still makes sense to push it down?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167932525
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48433/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167932503
  
**[Test build #48433 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48433/consoleFull)**
 for PR 10451 at commit 
[`cfbeea7`](https://github.com/apache/spark/commit/cfbeea767ff60f9125be2eee52c0ecc7e952d7c4).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167932524
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167931936
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167931892
  
**[Test build #48431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48431/consoleFull)**
 for PR 10451 at commit 
[`2823a57`](https://github.com/apache/spark/commit/2823a57b14705f961e53f53c9341f42752889474).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167931938
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48431/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167954060
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48455/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167954059
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167952117
  
The latest version covers both cases:
1) If the children' `maxRows` has a smaller number, add the extra limit.
2) If the children' `maxRows` is `None`, add the extra limit. 

Hopefully, you like the latest implementation. : ) @marmbrus @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48590465
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -91,6 +91,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the limited number of rows to be returned.
--- End diff --

Yeah, we need to override the function in both directions. Let me update 
the comments. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167742397
  
**[Test build #48400 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48400/consoleFull)**
 for PR 10451 at commit 
[`358d62e`](https://github.com/apache/spark/commit/358d62e7736191420f0d0a364269baa4fa54b0cb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167742459
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48400/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167742457
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48517911
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Adding extra Limit below UNION ALL if both left and right childs 
are not Limit.
+// This heuristic is valid assuming there does not exist any Limit 
push-down rule.
+case Limit(exp, Union(left, right))
+  if left.maxRows.isEmpty || right.maxRows.isEmpty =>
--- End diff --

is `left.maxRows.isEmpty` equal to `check if left is a Limit`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48518043
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Adding extra Limit below UNION ALL if both left and right childs 
are not Limit.
+// This heuristic is valid assuming there does not exist any Limit 
push-down rule.
+case Limit(exp, Union(left, right))
+  if left.maxRows.isEmpty || right.maxRows.isEmpty =>
--- End diff --

Actually I think this branch is safe without this check, did I miss 
something here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48518201
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Adding extra Limit below UNION ALL if both left and right childs 
are not Limit.
+// This heuristic is valid assuming there does not exist any Limit 
push-down rule.
+case Limit(exp, Union(left, right))
+  if left.maxRows.isEmpty || right.maxRows.isEmpty =>
+  Limit(exp,
+Union(
+  CombineLimits(Limit(exp, left)),
+  CombineLimits(Limit(exp, right
--- End diff --

We can get rid of this manual call now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48518190
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Adding extra Limit below UNION ALL if both left and right childs 
are not Limit.
+// This heuristic is valid assuming there does not exist any Limit 
push-down rule.
+case Limit(exp, Union(left, right))
+  if left.maxRows.isEmpty || right.maxRows.isEmpty =>
--- End diff --

The goal is to avoid double pushdown even if the limit has been pushed past 
another operator (i.e. a project).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48518228
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -91,6 +91,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the limited number of rows to be returned.
--- End diff --

Specify that any operator that a `Limit` can be pushed passed should 
override this function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167700694
  
> add a comment and explain the current solution. In the future, if we add 
such an operator, we can change the current way and fix the issue? (Already 
added a comment in the code)

I like this option


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48518442
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -91,6 +91,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the limited number of rows to be returned.
--- End diff --

And, thus, we should fix `Project` too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48501449
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -448,7 +448,21 @@ case class Pivot(
   }
 }
 
-case class Limit(limitExpr: Expression, child: LogicalPlan) extends 
UnaryNode {
+/** Factory for constructing new `Limit` nodes. */
+object Limit {
+  def apply(limitExpr: Expression, child: LogicalPlan): Limit = {
+new Limit(limitExpr, child, optimized = false)
+  }
+}
+
+/**
+ * Take the first `limitExpr` rows.
+ * @param limitExpr The number of returned rows
+ * @param child Child operator
+ * @param optimized This node has been optimized. Note that this is only a 
flag marker used
+ *  to avoid adding extra `Limit` nodes to the child 
operators more than once.
+ */
+case class Limit(limitExpr: Expression, child: LogicalPlan, optimized: 
Boolean) extends UnaryNode {
--- End diff --

I'm not a huge fan of modifying the logical plan to include extra details 
about what has been optimized.  This is an API that is used externally even 
though we don't make the same promises about it.

Can we add `def limit: Option[Expression]` or something to `LogicalPlan`?  
Nodes that are 1-1 can just output the limit of their child.  The default 
implementation can return `None`.  You can then check this before inserting a 
limit in the optimizer to make sure that its not a duplicate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48501690
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -448,7 +448,21 @@ case class Pivot(
   }
 }
 
-case class Limit(limitExpr: Expression, child: LogicalPlan) extends 
UnaryNode {
+/** Factory for constructing new `Limit` nodes. */
+object Limit {
+  def apply(limitExpr: Expression, child: LogicalPlan): Limit = {
+new Limit(limitExpr, child, optimized = false)
+  }
+}
+
+/**
+ * Take the first `limitExpr` rows.
+ * @param limitExpr The number of returned rows
+ * @param child Child operator
+ * @param optimized This node has been optimized. Note that this is only a 
flag marker used
+ *  to avoid adding extra `Limit` nodes to the child 
operators more than once.
+ */
+case class Limit(limitExpr: Expression, child: LogicalPlan, optimized: 
Boolean) extends UnaryNode {
--- End diff --

Sure, will do it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48523071
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -91,6 +91,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the limited number of rows to be returned.
--- End diff --

Fixed. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167677723
  
**[Test build #48379 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48379/consoleFull)**
 for PR 10451 at commit 
[`6998ec9`](https://github.com/apache/spark/commit/6998ec9d091260c63b40964997126838812bbf03).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167678849
  
A few updates are done, but I am not sure if the changes are appropriate. 

1. `limit` is already used in 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L251.
 Thus, I have to choose a different function name `limitedNumRows`
2. The new changes did not resolve the issue found by @cloud-fan . "If the 
left or right is an operator that can push Limit down(currently there is no 
such operator, but we can't guarantee there won't be)." To resolve this issue, 
we are facing three options: 
  * generate the value of `limitedNumRows` for each logicalPlan operators 
(except LeafNode) based on their children's `limitedNumRows`. However, we are 
unable to do that for some operators (e.g., `Join` and `Generate`) That means, 
although we can correctly get the value of `limitedNumRows`, we are unable to 
detect if the Limit has been pushed down if the child plan have these nodes.
  * write a recursive function for traversing all the child nodes based on 
their operator types. This might be over-engineered for this case. 
  * add a comment and explain the current solution. In the future, if we 
add such an operator, we can change the current code and fix the issue?
3. The current way will not add extra `Limit` if user already added `Limit` 
in both child nodes. This is shown in the newly added test case 
https://github.com/gatorsmile/spark/blob/6998ec9d091260c63b40964997126838812bbf03/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SetOperationPushDownSuite.scala#L83-L91

Please feel free let me know if I need to do any change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48512926
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -91,6 +91,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the limited number of rows to be returned.
+   */
+  def limitedNumRows: Option[Expression] = None
--- End diff --

`maxRows` or `rowLimit` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167679998
  
**[Test build #48380 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48380/consoleFull)**
 for PR 10451 at commit 
[`09a5672`](https://github.com/apache/spark/commit/09a56729669d2b3338b6d3b205008348519557f3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48513000
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -91,6 +91,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the limited number of rows to be returned.
+   */
+  def limitedNumRows: Option[Expression] = None
--- End diff --

sure, will change it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167688592
  
**[Test build #48379 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48379/consoleFull)**
 for PR 10451 at commit 
[`6998ec9`](https://github.com/apache/spark/commit/6998ec9d091260c63b40964997126838812bbf03).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167688631
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48379/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167688630
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167689297
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48380/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167689296
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167689240
  
**[Test build #48380 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48380/consoleFull)**
 for PR 10451 at commit 
[`09a5672`](https://github.com/apache/spark/commit/09a56729669d2b3338b6d3b205008348519557f3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167729202
  
**[Test build #48400 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48400/consoleFull)**
 for PR 10451 at commit 
[`358d62e`](https://github.com/apache/spark/commit/358d62e7736191420f0d0a364269baa4fa54b0cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167213191
  
**[Test build #48322 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48322/consoleFull)**
 for PR 10451 at commit 
[`3ccf3bd`](https://github.com/apache/spark/commit/3ccf3bd11f5ee89905c4a5cc4466fd4d332d0b42).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`case class Limit(limitExpr: Expression, child: LogicalPlan, optimized: 
Boolean) extends UnaryNode `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167213359
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48322/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167213357
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-25 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48441937
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,16 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Push down limit into union
+case Limit(exp, Union(left, right), optimized) if !optimized =>
--- End diff --

Thank you! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48441867
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,16 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Push down limit into union
+case Limit(exp, Union(left, right), optimized) if !optimized =>
--- End diff --

This works, but doesn't look good to me. Actually I don't have a good idea 
for it either, maybe we should just add comments to explain the assumption here.

cc @marmbrus @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48432615
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Push down limit into union
+case Limit(exp, Union(left, right)) =>
+  Limit(exp,
+Union(
+  CombineLimits(Limit(exp, left)),
+  CombineLimits(Limit(exp, right))
+)
+  )
--- End diff --

After think it more, there may be a problem: If `left` or `right` is an 
operator that can push `Limit` down(currently there is no such operator, but we 
can't guarantee there won't be). Then every time you push down a `Limit` here, 
it will be pushed down further. Thus the `CombineLimits` can NOT detect that 
you have already pushed the `Limit` down, and keeps generating new `Limit` and 
pushing it down.

I think we should have a better way to detect whether we have pushed 
`Limit` down or not, or add some comments to say that this rule assumes the 
newly added `Limit` on top of `left` and `right` won't be removed by other 
optimization rules.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48432925
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Push down limit into union
+case Limit(exp, Union(left, right)) =>
+  Limit(exp,
+Union(
+  CombineLimits(Limit(exp, left)),
+  CombineLimits(Limit(exp, right))
+)
+  )
--- End diff --

You are right. `Limit` might not converge to the same position after 
multiple pushdown.  

Let me think about it. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167203391
  
**[Test build #48320 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48320/consoleFull)**
 for PR 10451 at commit 
[`ae59f42`](https://github.com/apache/spark/commit/ae59f425457c5311ae2021c3654b6fca8c96f73e).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`case class Limit(limitExpr: Expression, child: LogicalPlan, optimized: 
Boolean) extends UnaryNode `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167203394
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48320/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167203393
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167203168
  
**[Test build #48320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48320/consoleFull)**
 for PR 10451 at commit 
[`ae59f42`](https://github.com/apache/spark/commit/ae59f425457c5311ae2021c3654b6fca8c96f73e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-24 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167202389
  
@cloud-fan Just added a quick fix by introducing a stop flag to avoid 
adding extra `Limit` in the subsequent iteration, but I am not sure if my way 
is good. 

After reading the code, maybe another way is using what we did for the flag 
`_analyzed`, as shown in the following codes:

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala/#L30-L35


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167205352
  
**[Test build #48322 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48322/consoleFull)**
 for PR 10451 at commit 
[`3ccf3bd`](https://github.com/apache/spark/commit/3ccf3bd11f5ee89905c4a5cc4466fd4d332d0b42).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167067050
  
**[Test build #48299 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48299/consoleFull)**
 for PR 10451 at commit 
[`7f25d91`](https://github.com/apache/spark/commit/7f25d910e579449138845c0241405a29a97450fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167079554
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167079556
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48299/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167079472
  
**[Test build #48299 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48299/consoleFull)**
 for PR 10451 at commit 
[`7f25d91`](https://github.com/apache/spark/commit/7f25d910e579449138845c0241405a29a97450fa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-166963932
  
**[Test build #48248 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48248/consoleFull)**
 for PR 10451 at commit 
[`56fd782`](https://github.com/apache/spark/commit/56fd78267618a3b3aab73ca01d682f4fb25a638c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/10451

[SPARK-12503] [SQL] Pushing Limit Through Union

"Rule that applies to a Limit on top of a Union. The original Limit won't 
go away after applying this rule, but additional Limit nodes will be created on 
top of each child of Union, so that these children produce less rows and Limit 
can be further optimized for children Relations."

– from https://issues.apache.org/jira/browse/CALCITE-832

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark unionLimit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10451.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10451


commit 56fd78267618a3b3aab73ca01d682f4fb25a638c
Author: gatorsmile 
Date:   2015-12-23T18:08:03Z

union limit pushdown.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167013142
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167013085
  
**[Test build #48261 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48261/consoleFull)**
 for PR 10451 at commit 
[`77105e3`](https://github.com/apache/spark/commit/77105e391a0adf6a6e0a3891a1c692e6bcdaf9f4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167013143
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48261/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48391800
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Push down limit into union
+case Limit(exp, Union(left, right)) =>
+  Limit(exp,
+Union(
+  CombineLimits(Limit(exp, left)),
+  CombineLimits(Limit(exp, right))
+)
+  )
--- End diff --

we need a stop condition, or it will keep pushing `Limit` forever


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48392104
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Push down limit into union
+case Limit(exp, Union(left, right)) =>
+  Limit(exp,
+Union(
+  CombineLimits(Limit(exp, left)),
+  CombineLimits(Limit(exp, right))
+)
+  )
--- End diff --

Since we call `CombineLimits` here, we will not add extra `Limit` in the 
second iteration. Thus, I think it will cause the plan change. Thus, it will 
stop automatically, right? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-166980946
  
@marmbrus @rxin Could you check if this is an appropriate improvement for 
Spark too?

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-166980561
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-166980563
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48248/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-167005403
  
**[Test build #48261 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48261/consoleFull)**
 for PR 10451 at commit 
[`77105e3`](https://github.com/apache/spark/commit/77105e391a0adf6a6e0a3891a1c692e6bcdaf9f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10451#issuecomment-166980458
  
**[Test build #48248 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48248/consoleFull)**
 for PR 10451 at commit 
[`56fd782`](https://github.com/apache/spark/commit/56fd78267618a3b3aab73ca01d682f4fb25a638c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...

2015-12-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10451#discussion_r48377323
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 )
   )
 
+// Push down limit into union
+case Limit(exp, Union(left, right)) =>
+  Limit(exp,
+Union(
+  Limit(exp, left),
+  Limit(exp, right)
+)
+  )
--- End diff --

A bug exists here. Will fix it soon. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org