cloud-fan commented on code in PR #37612:
URL: https://github.com/apache/spark/pull/37612#discussion_r954457377
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala:
##########
@@ -28,16 +28,31 @@ object AQEUtils {
def getRequiredDistribution(p: SparkPlan): Option[Distribution] = p match {
// User-specified repartition is only effective when it's the root node,
or under
// Project/Filter/LocalSort/CollectMetrics.
- // Note: we only care about `HashPartitioning` as `EnsureRequirements` can
only optimize out
- // user-specified repartition with `HashPartitioning`.
- case ShuffleExchangeExec(h: HashPartitioning, _, shuffleOrigin)
+ // Note, here are two cases of how user-specified repartition can be
optimized out:
+ // 1. `EnsureRequirements` can only optimize out user-specified
repartition with
+ // `HashPartitioning`.
+ // 2. `AQEOptimizer` can optimize out user-specified repartition with all
`Partitioning`,
+ // e.g. convert empty to local relation.
Review Comment:
shall we fix `PropagateEmptyRelationBase` instead? I don't think we can
optimize out `Repartition` which breaks user expectations. The change here only
covers AQE and I think this is a problem for non AQE as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]