[GitHub] [spark] c21 commented on a change in pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

GitBox Fri, 17 Jul 2020 07:40:57 -0700


c21 commented on a change in pull request #29130:
URL: https://github.com/apache/spark/pull/29130#discussion_r456484701




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
##########
@@ -47,6 +47,18 @@ case class ShuffledHashJoinExec(
     "buildDataSize" -> SQLMetrics.createSizeMetric(sparkContext, "data size of 
build side"),
     "buildTime" -> SQLMetrics.createTimingMetric(sparkContext, "time to build 
hash map"))
 
+  override def outputPartitioning: Partitioning = joinType match {

Review comment:
       @cloud-fan - [there's an extra case for sort merge join to handle full 
outer 
join](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala#L81).
 I am thinking to handle all other join types in parent trait `ShuffleJoin`, 
and override `outputPartitioning` in `SortMergeJoinExec` to handle the extra 
`FullOuter`? What do you think?
   
   But for me it's kind of weird that `ShuffleJoin` not handle  `FullOuter` as 
shuffled `FullOuter` join is one kind of `ShuffleJoin`. But if `ShuffleJoin` 
handles `FullOuter`, it seems to be also weird that `ShuffledHashJoinExec` 
extends it.
   
   Wondering what do you think? The change itself is easy. Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] c21 commented on a change in pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

Reply via email to