[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-170757463 Let's close the limit push down pull requests. We will need to design this more properly because it is expensive to push down large limits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-170757733 Sure, let me close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/10451 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167954886 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48456/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167954987 **[Test build #48461 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48461/consoleFull)** for PR 10451 at commit [`7cf955f`](https://github.com/apache/spark/commit/7cf955f5fd0546af85f247054a8a48e3fd54ae2e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167954885 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167972021 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48465/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167971953 **[Test build #48465 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48465/consoleFull)** for PR 10451 at commit [`7899312`](https://github.com/apache/spark/commit/7899312b7959a495d39d1ff392c573cdf7aa6755). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167972019 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167955427 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167955424 **[Test build #48461 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48461/consoleFull)** for PR 10451 at commit [`7cf955f`](https://github.com/apache/spark/commit/7cf955f5fd0546af85f247054a8a48e3fd54ae2e). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167955428 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48461/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167957490 **[Test build #48465 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48465/consoleFull)** for PR 10451 at commit [`7899312`](https://github.com/apache/spark/commit/7899312b7959a495d39d1ff392c573cdf7aa6755). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48558035 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,17 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Adding extra Limit below UNION ALL iff both left and right childs are not Limit and no Limit +// was pushed down before. This heuristic is valid assuming there does not exist any Limit +// push-down rule that is unable to infer the value of maxRows. Any operator that a Limit can +// be pushed passed should override this function. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- Is there a reason to not check left and right separately? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167845210 Thanks for working on this. I think its getting pretty close. A few minor cleanups that might be nice: - I think we should consider pulling all the Limit rules into their own `LimitPushDown` rule. The reasoning here is twofold: we can clearly comment in one central place the requirements with respect to implementing maxRows. It will be easier to turn off if it is ever doing the wrong thing. - We should do a pass through and add `maxRows` to any other logical plans where it make sense. Off the top of my head: - Filter = `child.maxRows` - Union = `for(leftMax <- left.maxRows; rightMax <- rightMax) yield Add(leftMax, rightMax)` - Distinct = `child.maxRows` - Aggregate - `child.maxRows` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167902518 After rethinking the `Limit` push-down rules, we are unable to push Limit through any operator that could change the values or the number of rows. Thus, so far, the eligible candidates are `Project`, `Union All` and `Outer/LeftOuter/RightOuter Join`. Please correct me if my understanding is not right. Feel free to let me know if the codes need an update. Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48581020 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -80,6 +81,33 @@ abstract class Optimizer extends RuleExecutor[LogicalPlan] { object DefaultOptimizer extends Optimizer /** + * Pushes down Limit for reducing the amount of returned data. + * + * 1. Adding Extra Limit beneath the operations, including Union All. + * 2. Project is pushed through Limit in the rule ColumnPruning + * + * Any operator that a Limit can be pushed passed should override the maxRows function. + */ +object PushDownLimit extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { + +// Adding extra Limit below UNION ALL iff both left and right childs are not Limit or +// do not have Limit descendants. This heuristic is valid assuming there does not exist +// any Limit push-down rule that is unable to infer the value of maxRows. +// Note, right now, Union means UNION ALL, which does not de-duplicate rows. So, it is +// safe to pushdown Limit through it. Once we add UNION DISTINCT, we will not be able to +// pushdown Limit. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- Yeah, you are right. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48578529 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -69,6 +73,33 @@ object DefaultOptimizer extends Optimizer { } /** + * Pushes down Limit for reducing the amount of returned data. + * + * 1. Adding Extra Limit beneath the operations, including Union All. + * 2. Project is pushed through Limit in the rule ColumnPruning + * + * Any operator that a Limit can be pushed passed should override the maxRows function. + * + * Note: This rule has to be done when the logical plan is stable; + * Otherwise, it could impact the other rules. --- End diff -- I'm not sure what this means? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48581249 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -91,6 +91,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the limited number of rows to be returned. --- End diff -- Actually we will push down `Project` through `Limit` in `ColumnPruning`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48582039 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -80,6 +81,33 @@ abstract class Optimizer extends RuleExecutor[LogicalPlan] { object DefaultOptimizer extends Optimizer /** + * Pushes down Limit for reducing the amount of returned data. + * + * 1. Adding Extra Limit beneath the operations, including Union All. + * 2. Project is pushed through Limit in the rule ColumnPruning + * + * Any operator that a Limit can be pushed passed should override the maxRows function. + */ +object PushDownLimit extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { + +// Adding extra Limit below UNION ALL iff both left and right childs are not Limit or +// do not have Limit descendants. This heuristic is valid assuming there does not exist +// any Limit push-down rule that is unable to infer the value of maxRows. +// Note, right now, Union means UNION ALL, which does not de-duplicate rows. So, it is +// safe to pushdown Limit through it. Once we add UNION DISTINCT, we will not be able to +// pushdown Limit. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- Yeah, that also makes sense. Will do the change after these three running test cases. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167916894 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48435/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167916892 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167899814 **[Test build #48431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48431/consoleFull)** for PR 10451 at commit [`2823a57`](https://github.com/apache/spark/commit/2823a57b14705f961e53f53c9341f42752889474). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48579121 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -69,6 +73,33 @@ object DefaultOptimizer extends Optimizer { } /** + * Pushes down Limit for reducing the amount of returned data. + * + * 1. Adding Extra Limit beneath the operations, including Union All. + * 2. Project is pushed through Limit in the rule ColumnPruning + * + * Any operator that a Limit can be pushed passed should override the maxRows function. + * + * Note: This rule has to be done when the logical plan is stable; + * Otherwise, it could impact the other rules. --- End diff -- For example, if we push `Limit` through `Filter`, `Aggregate` and `Distinct`, the results will be wrong. For example, `df.aggregate().limit(1)` and `df.limit(1).aggregate()`will generate the different results. This is true iff we can push `Limit` through some operators. So far, we did not find any eligible operators except ` outer/left-outer/right-outer Join` and `Union`. Thus, let me revert them back. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48580661 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -80,6 +81,33 @@ abstract class Optimizer extends RuleExecutor[LogicalPlan] { object DefaultOptimizer extends Optimizer /** + * Pushes down Limit for reducing the amount of returned data. + * + * 1. Adding Extra Limit beneath the operations, including Union All. + * 2. Project is pushed through Limit in the rule ColumnPruning + * + * Any operator that a Limit can be pushed passed should override the maxRows function. + */ +object PushDownLimit extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { + +// Adding extra Limit below UNION ALL iff both left and right childs are not Limit or +// do not have Limit descendants. This heuristic is valid assuming there does not exist +// any Limit push-down rule that is unable to infer the value of maxRows. +// Note, right now, Union means UNION ALL, which does not de-duplicate rows. So, it is +// safe to pushdown Limit through it. Once we add UNION DISTINCT, we will not be able to +// pushdown Limit. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- Okay, but why not break this into two parts. So that we push to the left when the left is not limited and we push to the right when the right is not limited. Now you push to both sides if either is not limited. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167916591 **[Test build #48435 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48435/consoleFull)** for PR 10451 at commit [`ca5c104`](https://github.com/apache/spark/commit/ca5c10467be387cc41c94018b1d7232369451102). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167900829 **[Test build #48433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48433/consoleFull)** for PR 10451 at commit [`cfbeea7`](https://github.com/apache/spark/commit/cfbeea767ff60f9125be2eee52c0ecc7e952d7c4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48579360 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,17 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Adding extra Limit below UNION ALL iff both left and right childs are not Limit and no Limit +// was pushed down before. This heuristic is valid assuming there does not exist any Limit +// push-down rule that is unable to infer the value of maxRows. Any operator that a Limit can +// be pushed passed should override this function. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- Below is the example. If one side has a limit child/descendant, we still can push it down to reduce the number of returned rows. https://github.com/gatorsmile/spark/blob/unionLimit/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PushdownLimitsSuite.scala#L50-L57 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167903020 **[Test build #48435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48435/consoleFull)** for PR 10451 at commit [`ca5c104`](https://github.com/apache/spark/commit/ca5c10467be387cc41c94018b1d7232369451102). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48581133 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -80,6 +81,33 @@ abstract class Optimizer extends RuleExecutor[LogicalPlan] { object DefaultOptimizer extends Optimizer /** + * Pushes down Limit for reducing the amount of returned data. + * + * 1. Adding Extra Limit beneath the operations, including Union All. + * 2. Project is pushed through Limit in the rule ColumnPruning + * + * Any operator that a Limit can be pushed passed should override the maxRows function. + */ +object PushDownLimit extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { + +// Adding extra Limit below UNION ALL iff both left and right childs are not Limit or +// do not have Limit descendants. This heuristic is valid assuming there does not exist +// any Limit push-down rule that is unable to infer the value of maxRows. +// Note, right now, Union means UNION ALL, which does not de-duplicate rows. So, it is +// safe to pushdown Limit through it. Once we add UNION DISTINCT, we will not be able to +// pushdown Limit. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- should we also check the limit value? If the `maxRows` is larger than the limit we wanna push down, seems it still makes sense to push it down? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167932525 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48433/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167932503 **[Test build #48433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48433/consoleFull)** for PR 10451 at commit [`cfbeea7`](https://github.com/apache/spark/commit/cfbeea767ff60f9125be2eee52c0ecc7e952d7c4). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167932524 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167931936 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167931892 **[Test build #48431 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48431/consoleFull)** for PR 10451 at commit [`2823a57`](https://github.com/apache/spark/commit/2823a57b14705f961e53f53c9341f42752889474). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167931938 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48431/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167954060 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48455/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167954059 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167952117 The latest version covers both cases: 1) If the children' `maxRows` has a smaller number, add the extra limit. 2) If the children' `maxRows` is `None`, add the extra limit. Hopefully, you like the latest implementation. : ) @marmbrus @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48590465 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -91,6 +91,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the limited number of rows to be returned. --- End diff -- Yeah, we need to override the function in both directions. Let me update the comments. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167742397 **[Test build #48400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48400/consoleFull)** for PR 10451 at commit [`358d62e`](https://github.com/apache/spark/commit/358d62e7736191420f0d0a364269baa4fa54b0cb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167742459 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48400/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167742457 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48517911 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Adding extra Limit below UNION ALL if both left and right childs are not Limit. +// This heuristic is valid assuming there does not exist any Limit push-down rule. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- is `left.maxRows.isEmpty` equal to `check if left is a Limit`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48518043 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Adding extra Limit below UNION ALL if both left and right childs are not Limit. +// This heuristic is valid assuming there does not exist any Limit push-down rule. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- Actually I think this branch is safe without this check, did I miss something here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48518201 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Adding extra Limit below UNION ALL if both left and right childs are not Limit. +// This heuristic is valid assuming there does not exist any Limit push-down rule. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => + Limit(exp, +Union( + CombineLimits(Limit(exp, left)), + CombineLimits(Limit(exp, right --- End diff -- We can get rid of this manual call now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48518190 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Adding extra Limit below UNION ALL if both left and right childs are not Limit. +// This heuristic is valid assuming there does not exist any Limit push-down rule. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- The goal is to avoid double pushdown even if the limit has been pushed past another operator (i.e. a project). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48518228 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -91,6 +91,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the limited number of rows to be returned. --- End diff -- Specify that any operator that a `Limit` can be pushed passed should override this function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167700694 > add a comment and explain the current solution. In the future, if we add such an operator, we can change the current way and fix the issue? (Already added a comment in the code) I like this option --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48518442 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -91,6 +91,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the limited number of rows to be returned. --- End diff -- And, thus, we should fix `Project` too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48501449 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -448,7 +448,21 @@ case class Pivot( } } -case class Limit(limitExpr: Expression, child: LogicalPlan) extends UnaryNode { +/** Factory for constructing new `Limit` nodes. */ +object Limit { + def apply(limitExpr: Expression, child: LogicalPlan): Limit = { +new Limit(limitExpr, child, optimized = false) + } +} + +/** + * Take the first `limitExpr` rows. + * @param limitExpr The number of returned rows + * @param child Child operator + * @param optimized This node has been optimized. Note that this is only a flag marker used + * to avoid adding extra `Limit` nodes to the child operators more than once. + */ +case class Limit(limitExpr: Expression, child: LogicalPlan, optimized: Boolean) extends UnaryNode { --- End diff -- I'm not a huge fan of modifying the logical plan to include extra details about what has been optimized. This is an API that is used externally even though we don't make the same promises about it. Can we add `def limit: Option[Expression]` or something to `LogicalPlan`? Nodes that are 1-1 can just output the limit of their child. The default implementation can return `None`. You can then check this before inserting a limit in the optimizer to make sure that its not a duplicate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48501690 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -448,7 +448,21 @@ case class Pivot( } } -case class Limit(limitExpr: Expression, child: LogicalPlan) extends UnaryNode { +/** Factory for constructing new `Limit` nodes. */ +object Limit { + def apply(limitExpr: Expression, child: LogicalPlan): Limit = { +new Limit(limitExpr, child, optimized = false) + } +} + +/** + * Take the first `limitExpr` rows. + * @param limitExpr The number of returned rows + * @param child Child operator + * @param optimized This node has been optimized. Note that this is only a flag marker used + * to avoid adding extra `Limit` nodes to the child operators more than once. + */ +case class Limit(limitExpr: Expression, child: LogicalPlan, optimized: Boolean) extends UnaryNode { --- End diff -- Sure, will do it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48523071 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -91,6 +91,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the limited number of rows to be returned. --- End diff -- Fixed. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167677723 **[Test build #48379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48379/consoleFull)** for PR 10451 at commit [`6998ec9`](https://github.com/apache/spark/commit/6998ec9d091260c63b40964997126838812bbf03). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167678849 A few updates are done, but I am not sure if the changes are appropriate. 1. `limit` is already used in https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L251. Thus, I have to choose a different function name `limitedNumRows` 2. The new changes did not resolve the issue found by @cloud-fan . "If the left or right is an operator that can push Limit down(currently there is no such operator, but we can't guarantee there won't be)." To resolve this issue, we are facing three options: * generate the value of `limitedNumRows` for each logicalPlan operators (except LeafNode) based on their children's `limitedNumRows`. However, we are unable to do that for some operators (e.g., `Join` and `Generate`) That means, although we can correctly get the value of `limitedNumRows`, we are unable to detect if the Limit has been pushed down if the child plan have these nodes. * write a recursive function for traversing all the child nodes based on their operator types. This might be over-engineered for this case. * add a comment and explain the current solution. In the future, if we add such an operator, we can change the current code and fix the issue? 3. The current way will not add extra `Limit` if user already added `Limit` in both child nodes. This is shown in the newly added test case https://github.com/gatorsmile/spark/blob/6998ec9d091260c63b40964997126838812bbf03/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SetOperationPushDownSuite.scala#L83-L91 Please feel free let me know if I need to do any change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48512926 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -91,6 +91,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the limited number of rows to be returned. + */ + def limitedNumRows: Option[Expression] = None --- End diff -- `maxRows` or `rowLimit` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167679998 **[Test build #48380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48380/consoleFull)** for PR 10451 at commit [`09a5672`](https://github.com/apache/spark/commit/09a56729669d2b3338b6d3b205008348519557f3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48513000 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -91,6 +91,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the limited number of rows to be returned. + */ + def limitedNumRows: Option[Expression] = None --- End diff -- sure, will change it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167688592 **[Test build #48379 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48379/consoleFull)** for PR 10451 at commit [`6998ec9`](https://github.com/apache/spark/commit/6998ec9d091260c63b40964997126838812bbf03). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167688631 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48379/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167688630 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167689297 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48380/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167689296 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167689240 **[Test build #48380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48380/consoleFull)** for PR 10451 at commit [`09a5672`](https://github.com/apache/spark/commit/09a56729669d2b3338b6d3b205008348519557f3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167729202 **[Test build #48400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48400/consoleFull)** for PR 10451 at commit [`358d62e`](https://github.com/apache/spark/commit/358d62e7736191420f0d0a364269baa4fa54b0cb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167213191 **[Test build #48322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48322/consoleFull)** for PR 10451 at commit [`3ccf3bd`](https://github.com/apache/spark/commit/3ccf3bd11f5ee89905c4a5cc4466fd4d332d0b42). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `case class Limit(limitExpr: Expression, child: LogicalPlan, optimized: Boolean) extends UnaryNode `\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167213359 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48322/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167213357 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48441937 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,16 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Push down limit into union +case Limit(exp, Union(left, right), optimized) if !optimized => --- End diff -- Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48441867 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,16 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Push down limit into union +case Limit(exp, Union(left, right), optimized) if !optimized => --- End diff -- This works, but doesn't look good to me. Actually I don't have a good idea for it either, maybe we should just add comments to explain the assumption here. cc @marmbrus @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48432615 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Push down limit into union +case Limit(exp, Union(left, right)) => + Limit(exp, +Union( + CombineLimits(Limit(exp, left)), + CombineLimits(Limit(exp, right)) +) + ) --- End diff -- After think it more, there may be a problem: If `left` or `right` is an operator that can push `Limit` down(currently there is no such operator, but we can't guarantee there won't be). Then every time you push down a `Limit` here, it will be pushed down further. Thus the `CombineLimits` can NOT detect that you have already pushed the `Limit` down, and keeps generating new `Limit` and pushing it down. I think we should have a better way to detect whether we have pushed `Limit` down or not, or add some comments to say that this rule assumes the newly added `Limit` on top of `left` and `right` won't be removed by other optimization rules. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48432925 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Push down limit into union +case Limit(exp, Union(left, right)) => + Limit(exp, +Union( + CombineLimits(Limit(exp, left)), + CombineLimits(Limit(exp, right)) +) + ) --- End diff -- You are right. `Limit` might not converge to the same position after multiple pushdown. Let me think about it. Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167203391 **[Test build #48320 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48320/consoleFull)** for PR 10451 at commit [`ae59f42`](https://github.com/apache/spark/commit/ae59f425457c5311ae2021c3654b6fca8c96f73e). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `case class Limit(limitExpr: Expression, child: LogicalPlan, optimized: Boolean) extends UnaryNode `\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167203394 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48320/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167203393 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167203168 **[Test build #48320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48320/consoleFull)** for PR 10451 at commit [`ae59f42`](https://github.com/apache/spark/commit/ae59f425457c5311ae2021c3654b6fca8c96f73e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167202389 @cloud-fan Just added a quick fix by introducing a stop flag to avoid adding extra `Limit` in the subsequent iteration, but I am not sure if my way is good. After reading the code, maybe another way is using what we did for the flag `_analyzed`, as shown in the following codes: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala/#L30-L35 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167205352 **[Test build #48322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48322/consoleFull)** for PR 10451 at commit [`3ccf3bd`](https://github.com/apache/spark/commit/3ccf3bd11f5ee89905c4a5cc4466fd4d332d0b42). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167067050 **[Test build #48299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48299/consoleFull)** for PR 10451 at commit [`7f25d91`](https://github.com/apache/spark/commit/7f25d910e579449138845c0241405a29a97450fa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167079554 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167079556 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48299/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167079472 **[Test build #48299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48299/consoleFull)** for PR 10451 at commit [`7f25d91`](https://github.com/apache/spark/commit/7f25d910e579449138845c0241405a29a97450fa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-166963932 **[Test build #48248 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48248/consoleFull)** for PR 10451 at commit [`56fd782`](https://github.com/apache/spark/commit/56fd78267618a3b3aab73ca01d682f4fb25a638c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/10451 [SPARK-12503] [SQL] Pushing Limit Through Union "Rule that applies to a Limit on top of a Union. The original Limit won't go away after applying this rule, but additional Limit nodes will be created on top of each child of Union, so that these children produce less rows and Limit can be further optimized for children Relations." â from https://issues.apache.org/jira/browse/CALCITE-832 You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark unionLimit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10451.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10451 commit 56fd78267618a3b3aab73ca01d682f4fb25a638c Author: gatorsmileDate: 2015-12-23T18:08:03Z union limit pushdown. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167013142 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167013085 **[Test build #48261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48261/consoleFull)** for PR 10451 at commit [`77105e3`](https://github.com/apache/spark/commit/77105e391a0adf6a6e0a3891a1c692e6bcdaf9f4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167013143 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48261/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48391800 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Push down limit into union +case Limit(exp, Union(left, right)) => + Limit(exp, +Union( + CombineLimits(Limit(exp, left)), + CombineLimits(Limit(exp, right)) +) + ) --- End diff -- we need a stop condition, or it will keep pushing `Limit` forever --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48392104 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Push down limit into union +case Limit(exp, Union(left, right)) => + Limit(exp, +Union( + CombineLimits(Limit(exp, left)), + CombineLimits(Limit(exp, right)) +) + ) --- End diff -- Since we call `CombineLimits` here, we will not add extra `Limit` in the second iteration. Thus, I think it will cause the plan change. Thus, it will stop automatically, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-166980946 @marmbrus @rxin Could you check if this is an appropriate improvement for Spark too? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-166980561 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-166980563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48248/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167005403 **[Test build #48261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48261/consoleFull)** for PR 10451 at commit [`77105e3`](https://github.com/apache/spark/commit/77105e391a0adf6a6e0a3891a1c692e6bcdaf9f4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-166980458 **[Test build #48248 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48248/consoleFull)** for PR 10451 at commit [`56fd782`](https://github.com/apache/spark/commit/56fd78267618a3b3aab73ca01d682f4fb25a638c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48377323 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Push down limit into union +case Limit(exp, Union(left, right)) => + Limit(exp, +Union( + Limit(exp, left), + Limit(exp, right) +) + ) --- End diff -- A bug exists here. Will fix it soon. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org