[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15575


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84411683
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -96,13 +95,15 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
  */
 case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
   override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+  override def outputPartitioning: Partitioning = child.outputPartitioning
 }
 
 /**
  * Take the first `limit` elements of the child's single output partition.
  */
 case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
   override def requiredChildDistribution: List[Distribution] = AllTuples 
:: Nil
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

Nit: also add an empty line here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84411628
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala ---
@@ -45,6 +45,8 @@ case class SortExec(
 
   override def outputOrdering: Seq[SortOrder] = sortOrder
 
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

Could we add comments here? It can help others understand the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84410677
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -96,13 +95,15 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
  */
 case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
   override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+  override def outputPartitioning: Partitioning = child.outputPartitioning
 }
 
 /**
  * Take the first `limit` elements of the child's single output partition.
  */
 case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
   override def requiredChildDistribution: List[Distribution] = AllTuples 
:: Nil
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

sorry. nvm. `child.outputPartitioning` is right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84410707
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala 
---
@@ -103,6 +103,8 @@ case class WindowExec(
 
   override def outputOrdering: Seq[SortOrder] = child.outputOrdering
 
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

This is right. This operator does not shuffle data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84410586
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -96,13 +95,15 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
  */
 case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
   override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+  override def outputPartitioning: Partitioning = child.outputPartitioning
 }
 
 /**
  * Take the first `limit` elements of the child's single output partition.
  */
 case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
BaseLimitExec {
   override def requiredChildDistribution: List[Distribution] = AllTuples 
:: Nil
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

This should be `SinglePartition` since `requiredChildDistribution` is 
`AllTuples`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84410162
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala ---
@@ -45,6 +45,8 @@ case class SortExec(
 
   override def outputOrdering: Seq[SortOrder] = sortOrder
 
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

No. ShuffleExchange is the only operator that can change 
`outputPartitioning`. `global` at here only influences the value of 
`requiredChildDistribution`, which later help the planner decide if to add a 
ShuffleExchange operator for range partitioning.   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84406052
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala 
---
@@ -103,6 +103,8 @@ case class WindowExec(
 
   override def outputOrdering: Seq[SortOrder] = child.outputOrdering
 
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

here too - it depends on what "child" meant and how outputPartitioning is 
used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84405685
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -62,6 +62,7 @@ case class HashAggregateExec(
 "aggTime" -> SQLMetrics.createTimingMetric(sparkContext, "aggregate 
time"))
 
   override def output: Seq[Attribute] = 
resultExpressions.map(_.toAttribute)
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

add a blank line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15575#discussion_r84405648
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala ---
@@ -45,6 +45,8 @@ case class SortExec(
 
   override def outputOrdering: Seq[SortOrder] = sortOrder
 
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

this depends on global, doesn't it?

cc @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15575: [SPARK-18038] [SQL] Move output partitioning defi...

2016-10-20 Thread tejasapatil
GitHub user tejasapatil opened a pull request:

https://github.com/apache/spark/pull/15575

[SPARK-18038] [SQL] Move output partitioning definition from UnaryNodeExec 
to its children

## What changes were proposed in this pull request?

Jira : https://issues.apache.org/jira/browse/SPARK-18038

This was a suggestion by @rxin over one of the dev list discussion : 
http://apache-spark-developers-list.1001551.n3.nabble.com/Project-not-preserving-child-partitioning-td19417.html

His words:

>> It would be better (safer) to move the output partitioning definition 
into each of the operator and remove it from UnaryExecNode.


## How was this patch tested?

This does NOT change any existing functionality so relying on existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tejasapatil/spark 
SPARK-18038_UnaryNodeExec_output_partitioning

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15575.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15575


commit 85fb132196a700072e5a8e70ebf17ae24d7958e4
Author: Tejas Patil 
Date:   2016-10-21T00:21:49Z

[SPARK-18038] [SQL] Move output partitioning definition from UnaryNodeExec 
to its children




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org