[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55336261 @marbrus I see what you mean. Updated to basically what you suggested, aside from building the map once. Let me know, once it's finalized I can try to test one more time on live data. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/2345#discussion_r17452614 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -47,6 +48,53 @@ object Optimizer extends RuleExecutor[LogicalPlan] { } /** + * Pushes operations to either side of a Union. + */ +object UnionPushdown extends Rule[LogicalPlan] { + + /** +* Maps Attributes from the left side to the corresponding Attribute on the right side. +*/ + def buildRewrites(union: Union): AttributeMap[Attribute] = { +assert(union.left.output.size == union.right.output.size) + +AttributeMap(union.left.output.zip(union.right.output)) + } + + /** +* Rewrites an expression so that it can be pushed to the right side of a Union operator. +* This method relies on the fact that the output attributes of a union are always equal +* to the left child's output. +*/ + def pushToRight[A : Expression](e: A, union: Union, rewrites: AttributeMap[Attribute]): A = { --- End diff -- Nit: `union` is not used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55336425 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20170/consoleFull) for PR 2345 at commit [`0788691`](https://github.com/apache/spark/commit/07886917b71ad0b23fbe68253a568d29882a21b1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55336546 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/73/consoleFull) for PR 2345 at commit [`0788691`](https://github.com/apache/spark/commit/07886917b71ad0b23fbe68253a568d29882a21b1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55336553 LGTM to me once the tests pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55342753 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20176/consoleFull) for PR 2345 at commit [`5c8d24d`](https://github.com/apache/spark/commit/5c8d24d07eaaa35e7b596e047861b663fc10f03d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55345242 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/73/consoleFull) for PR 2345 at commit [`0788691`](https://github.com/apache/spark/commit/07886917b71ad0b23fbe68253a568d29882a21b1). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CreateTableAsSelect(` * `case class CreateTableAsSelect(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55345198 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20170/consoleFull) for PR 2345 at commit [`0788691`](https://github.com/apache/spark/commit/07886917b71ad0b23fbe68253a568d29882a21b1). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CreateTableAsSelect(` * `case class CreateTableAsSelect(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55348969 Thanks! I've merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2345 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55350225 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20176/consoleFull) for PR 2345 at commit [`5c8d24d`](https://github.com/apache/spark/commit/5c8d24d07eaaa35e7b596e047861b663fc10f03d). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CreateTableAsSelect(` * `case class CreateTableAsSelect(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/2345 SPARK-3462 push down filters and projections into Unions You can merge this pull request into a Git repository by running: $ git pull https://github.com/mediacrossinginc/spark SPARK-3462 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2345.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2345 commit ef47b3b80dd92f4652947ccffa5c9fea97adffb0 Author: Cody Koeninger cody.koenin...@mediacrossing.com Date: 2014-09-10T05:07:58Z SPARK-3462 push down filters and projections into Unions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55117296 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55172311 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55172771 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20111/consoleFull) for PR 2345 at commit [`ef47b3b`](https://github.com/apache/spark/commit/ef47b3b80dd92f4652947ccffa5c9fea97adffb0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55178764 Hey @koeninger, thanks for implementing this optimization! Overall this looks pretty good. A few minor suggestions: - I'm not sure that we want to check the names and qualifiers to find the corresponding `Attribute`s. It is possible that either side could actually have tables that are aliased differently and thus the `Attribute`s would have different qualifiers. Instead, I think that it safe to assume that the analysis phase has checked name and type matching and just find the corresponding `Attribute` by ordering. I'm thinking something like this: ```scala /** * Pushes Project and Filter operations to either side of a Union. */ object UnionPushdown extends Rule[LogicalPlan] { /** * Rewrites an expression so that it can be pushed to the right side of a Union operator. * This method relies on the fact that the output attributes of a union are always equal to the * left child's output. */ def pushToRight[A : Expression](e: A, union: Union): A = { assert(union.left.output.size == union.right.output.size) // Maps Attributes from the left side to the corresponding Attribute on the right side. val rewrites = AttributeMap(union.left.output.zip(union.right.output)) val result = e transform { case a: Attribute = rewrites(a) } // We must promise the compiler that we did not discard the names in the case of project // expressions. This is safe since the only transformation is from Attribute = Attribute. result.asInstanceOf[A] } def apply(plan: LogicalPlan): LogicalPlan = plan transform { // Push down filter into union case Filter(condition, u @ Union(left, right)) = Union( Filter(condition, left), Filter(pushToRight(condition, u), right)) // Push down projection into union case Project(projectList, u @ Union(left, right)) = Union( Project(projectList, left), Project(projectList.map(pushToRight(_, u)), right)) } } ``` - Would also be great to add some tests to `FilterPushdownSuite` and maybe create a similar `ColumnPruningSuite`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3462 push down filters and projections i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55185718 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20111/consoleFull) for PR 2345 at commit [`ef47b3b`](https://github.com/apache/spark/commit/ef47b3b80dd92f4652947ccffa5c9fea97adffb0). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org