ulysses-you commented on code in PR #37612:
URL: https://github.com/apache/spark/pull/37612#discussion_r954473375


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala:
##########
@@ -28,16 +28,31 @@ object AQEUtils {
   def getRequiredDistribution(p: SparkPlan): Option[Distribution] = p match {
     // User-specified repartition is only effective when it's the root node, 
or under
     // Project/Filter/LocalSort/CollectMetrics.
-    // Note: we only care about `HashPartitioning` as `EnsureRequirements` can 
only optimize out
-    // user-specified repartition with `HashPartitioning`.
-    case ShuffleExchangeExec(h: HashPartitioning, _, shuffleOrigin)
+    // Note, here are two cases of how user-specified repartition can be 
optimized out:
+    // 1. `EnsureRequirements` can only optimize out user-specified 
repartition with
+    //    `HashPartitioning`.
+    // 2. `AQEOptimizer` can optimize out user-specified repartition with all 
`Partitioning`,
+    //     e.g. convert empty to local relation.

Review Comment:
   > shall we fix PropagateEmptyRelationBase instead? I don't think we can 
optimize out Repartition which breaks user expectations. The change here only 
covers AQE and I think this is a problem for non AQE as well.
   
   if we do not optimize the repartition at the top of empty relation, we also 
can not optimize other plan at the top of repartition. It may become a 
regression. We should only preserve the final repatition which can affect the 
final output partition as the `requiredDistribution` did.
   
   > Or, we can still optimize out Repartition, but we should assign a 
Partitioning to LocalRelation. Ideally a empty data can be any partitioning.
   
   I actually thought about it, but it's not easy. If we want to assign a 
`Partitioning` to `LocalRelation`, we also need to consider how to propagate 
the `Partitioning` when we through other plan otherwise it does less meaning. 
Then we need care the output partitioning for all logical plan. This should not 
be expected since the partitioning is a physical concept.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to