sarutak commented on a change in pull request #29677:
URL: https://github.com/apache/spark/pull/29677#discussion_r496746261
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
##########
@@ -52,7 +52,10 @@ case class EnsureRequirements(conf: SQLConf) extends
Rule[SparkPlan] {
case (child, distribution) =>
val numPartitions = distribution.requiredNumPartitions
.getOrElse(conf.numShufflePartitions)
- ShuffleExchangeExec(distribution.createPartitioning(numPartitions),
child)
+ // Like optimizer.CollapseRepartition removes adjacent repartition
operations,
+ // adjacent repartitions performed by shuffle can be also removed.
+ val newChild = if (child.isInstanceOf[ShuffleExchangeExec])
child.children.head else child
Review comment:
I have considered such a case but if a shuffle/partitioning performs
after its immediate child of another shuffle/partitioning, is the child
shuffle/partitioning meaningful?
In the example above, is the result not different regardless of
`RoundRobinPartitioning` removed or not?
I checked #26946 and I understand that the root cause of that issue was the
bug about [how to handle hints in the
parser](https://github.com/apache/spark/pull/26946#issuecomment-580861982) so
the approach which uses optimization was wrong and incomplete for that issue.
The solutions are similar between this PR and that PR but the issue is
difference.
This PR just focuses on removing redundant shuffles.
I might misunderstand about what you point out and that PR, so please let me
correct.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]