maropu commented on a change in pull request #29677:
URL: https://github.com/apache/spark/pull/29677#discussion_r496338261



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
##########
@@ -52,7 +52,10 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
       case (child, distribution) =>
         val numPartitions = distribution.requiredNumPartitions
           .getOrElse(conf.numShufflePartitions)
-        ShuffleExchangeExec(distribution.createPartitioning(numPartitions), 
child)
+        // Like optimizer.CollapseRepartition removes adjacent repartition 
operations,
+        // adjacent repartitions performed by shuffle can be also removed.
+        val newChild = if (child.isInstanceOf[ShuffleExchangeExec]) 
child.children.head else child

Review comment:
       To avoid the case @HyukjinKwon pointed out above, it seems we need to 
check if `outputPartitioning` is the same for narrowing down this optimization 
scope.
   
   Btw, in my opinion, to avoid complicating the `EnsureRequirements` rule 
more, it would be better to remove these kinds of redundant shuffles in a new 
rule after `EnsureRequirements` like https://github.com/apache/spark/pull/27096.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to