[GitHub] [spark] sarutak commented on a change in pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements

GitBox Tue, 29 Sep 2020 07:07:16 -0700


sarutak commented on a change in pull request #29677:
URL: https://github.com/apache/spark/pull/29677#discussion_r496746261




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
##########
@@ -52,7 +52,10 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
       case (child, distribution) =>
         val numPartitions = distribution.requiredNumPartitions
           .getOrElse(conf.numShufflePartitions)
-        ShuffleExchangeExec(distribution.createPartitioning(numPartitions), 
child)
+        // Like optimizer.CollapseRepartition removes adjacent repartition 
operations,
+        // adjacent repartitions performed by shuffle can be also removed.
+        val newChild = if (child.isInstanceOf[ShuffleExchangeExec]) 
child.children.head else child

Review comment:
       I have considered such a case but if a shuffle/partitioning performs 
after its immediate child of another shuffle/partitioning, is the child 
shuffle/partitioning  meaningful?
   In the example above, is the result not different regardless of 
`RoundRobinPartitioning` removed or not?
   
   I checked #26946 and I understand that the root cause of that issue was the 
bug about [how to handle hints in the 
parser](https://github.com/apache/spark/pull/26946#issuecomment-580861982) so 
the approach which uses optimization was wrong and incomplete for that issue.
   
   The solutions are similar between this PR and that PR but the issue is 
difference.
   This PR just focuses on removing redundant shuffles.
   
   I might misunderstand about what you point out and that PR, so please let me 
correct.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sarutak commented on a change in pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements

Reply via email to