[GitHub] [spark] cloud-fan commented on a diff in pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

GitBox Wed, 24 Aug 2022 20:01:50 -0700


cloud-fan commented on code in PR #37612:
URL: https://github.com/apache/spark/pull/37612#discussion_r954457377



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala:
##########
@@ -28,16 +28,31 @@ object AQEUtils {
   def getRequiredDistribution(p: SparkPlan): Option[Distribution] = p match {
     // User-specified repartition is only effective when it's the root node, 
or under
     // Project/Filter/LocalSort/CollectMetrics.
-    // Note: we only care about `HashPartitioning` as `EnsureRequirements` can 
only optimize out
-    // user-specified repartition with `HashPartitioning`.
-    case ShuffleExchangeExec(h: HashPartitioning, _, shuffleOrigin)
+    // Note, here are two cases of how user-specified repartition can be 
optimized out:
+    // 1. `EnsureRequirements` can only optimize out user-specified 
repartition with
+    //    `HashPartitioning`.
+    // 2. `AQEOptimizer` can optimize out user-specified repartition with all 
`Partitioning`,
+    //     e.g. convert empty to local relation.

Review Comment:
   shall we fix `PropagateEmptyRelationBase` instead? I don't think we can 
optimize out `Repartition` which breaks user expectations. The change here only 
covers AQE and I think this is a problem for non AQE as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

Reply via email to