maropu commented on a change in pull request #29677:
URL: https://github.com/apache/spark/pull/29677#discussion_r496338261
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
##########
@@ -52,7 +52,10 @@ case class EnsureRequirements(conf: SQLConf) extends
Rule[SparkPlan] {
case (child, distribution) =>
val numPartitions = distribution.requiredNumPartitions
.getOrElse(conf.numShufflePartitions)
- ShuffleExchangeExec(distribution.createPartitioning(numPartitions),
child)
+ // Like optimizer.CollapseRepartition removes adjacent repartition
operations,
+ // adjacent repartitions performed by shuffle can be also removed.
+ val newChild = if (child.isInstanceOf[ShuffleExchangeExec])
child.children.head else child
Review comment:
To avoid the case @HyukjinKwon pointed out above, it seems we need to
check if `outputPartitioning` is the same for narrowing down this optimization
scope.
Btw, in my opinion, to avoid complicating the `EnsureRequirements` rule
more, it would be better to remove these kinds of redundant shuffles in a new
rule after `EnsureRequirements` like https://github.com/apache/spark/pull/27096.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]