[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15575 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15575#discussion_r84411683 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -96,13 +95,15 @@ trait BaseLimitExec extends UnaryExecNode with CodegenSupport { */ case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { override def outputOrdering: Seq[SortOrder] = child.outputOrdering + override def outputPartitioning: Partitioning = child.outputPartitioning } /** * Take the first `limit` elements of the child's single output partition. */ case class GlobalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { override def requiredChildDistribution: List[Distribution] = AllTuples :: Nil + override def outputPartitioning: Partitioning = child.outputPartitioning --- End diff -- Nit: also add an empty line here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15575#discussion_r84411628 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala --- @@ -45,6 +45,8 @@ case class SortExec( override def outputOrdering: Seq[SortOrder] = sortOrder + override def outputPartitioning: Partitioning = child.outputPartitioning --- End diff -- Could we add comments here? It can help others understand the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/15575#discussion_r84410677 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -96,13 +95,15 @@ trait BaseLimitExec extends UnaryExecNode with CodegenSupport { */ case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { override def outputOrdering: Seq[SortOrder] = child.outputOrdering + override def outputPartitioning: Partitioning = child.outputPartitioning } /** * Take the first `limit` elements of the child's single output partition. */ case class GlobalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { override def requiredChildDistribution: List[Distribution] = AllTuples :: Nil + override def outputPartitioning: Partitioning = child.outputPartitioning --- End diff -- sorry. nvm. `child.outputPartitioning` is right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/15575#discussion_r84410707 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala --- @@ -103,6 +103,8 @@ case class WindowExec( override def outputOrdering: Seq[SortOrder] = child.outputOrdering + override def outputPartitioning: Partitioning = child.outputPartitioning --- End diff -- This is right. This operator does not shuffle data. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/15575#discussion_r84410586 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -96,13 +95,15 @@ trait BaseLimitExec extends UnaryExecNode with CodegenSupport { */ case class LocalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { override def outputOrdering: Seq[SortOrder] = child.outputOrdering + override def outputPartitioning: Partitioning = child.outputPartitioning } /** * Take the first `limit` elements of the child's single output partition. */ case class GlobalLimitExec(limit: Int, child: SparkPlan) extends BaseLimitExec { override def requiredChildDistribution: List[Distribution] = AllTuples :: Nil + override def outputPartitioning: Partitioning = child.outputPartitioning --- End diff -- This should be `SinglePartition` since `requiredChildDistribution` is `AllTuples`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/15575#discussion_r84410162 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala --- @@ -45,6 +45,8 @@ case class SortExec( override def outputOrdering: Seq[SortOrder] = sortOrder + override def outputPartitioning: Partitioning = child.outputPartitioning --- End diff -- No. ShuffleExchange is the only operator that can change `outputPartitioning`. `global` at here only influences the value of `requiredChildDistribution`, which later help the planner decide if to add a ShuffleExchange operator for range partitioning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15575#discussion_r84406052 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala --- @@ -103,6 +103,8 @@ case class WindowExec( override def outputOrdering: Seq[SortOrder] = child.outputOrdering + override def outputPartitioning: Partitioning = child.outputPartitioning --- End diff -- here too - it depends on what "child" meant and how outputPartitioning is used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15575#discussion_r84405685 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -62,6 +62,7 @@ case class HashAggregateExec( "aggTime" -> SQLMetrics.createTimingMetric(sparkContext, "aggregate time")) override def output: Seq[Attribute] = resultExpressions.map(_.toAttribute) + override def outputPartitioning: Partitioning = child.outputPartitioning --- End diff -- add a blank line? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15575#discussion_r84405648 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala --- @@ -45,6 +45,8 @@ case class SortExec( override def outputOrdering: Seq[SortOrder] = sortOrder + override def outputPartitioning: Partitioning = child.outputPartitioning --- End diff -- this depends on global, doesn't it? cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/15575 [SPARK-18038] [SQL] Move output partitioning definition from UnaryNodeExec to its children ## What changes were proposed in this pull request? Jira : https://issues.apache.org/jira/browse/SPARK-18038 This was a suggestion by @rxin over one of the dev list discussion : http://apache-spark-developers-list.1001551.n3.nabble.com/Project-not-preserving-child-partitioning-td19417.html His words: >> It would be better (safer) to move the output partitioning definition into each of the operator and remove it from UnaryExecNode. ## How was this patch tested? This does NOT change any existing functionality so relying on existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/tejasapatil/spark SPARK-18038_UnaryNodeExec_output_partitioning Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15575.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15575 commit 85fb132196a700072e5a8e70ebf17ae24d7958e4 Author: Tejas Patil Date: 2016-10-21T00:21:49Z [SPARK-18038] [SQL] Move output partitioning definition from UnaryNodeExec to its children --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org