[GitHub] [spark] cloud-fan commented on a diff in pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

GitBox Wed, 24 Aug 2022 23:44:54 -0700


cloud-fan commented on code in PR #37612:
URL: https://github.com/apache/spark/pull/37612#discussion_r954560331



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala:
##########
@@ -28,16 +28,31 @@ object AQEUtils {
   def getRequiredDistribution(p: SparkPlan): Option[Distribution] = p match {
     // User-specified repartition is only effective when it's the root node, 
or under
     // Project/Filter/LocalSort/CollectMetrics.
-    // Note: we only care about `HashPartitioning` as `EnsureRequirements` can 
only optimize out
-    // user-specified repartition with `HashPartitioning`.
-    case ShuffleExchangeExec(h: HashPartitioning, _, shuffleOrigin)
+    // Note, here are two cases of how user-specified repartition can be 
optimized out:
+    // 1. `EnsureRequirements` can only optimize out user-specified 
repartition with
+    //    `HashPartitioning`.
+    // 2. `AQEOptimizer` can optimize out user-specified repartition with all 
`Partitioning`,
+    //     e.g. convert empty to local relation.

Review Comment:
   OK let me make my proposal clear: let's not optimize out repartition if it's 
the root node, or below Project/Filter, in any cases. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

Reply via email to