wangyum commented on PR #40805: URL: https://github.com/apache/spark/pull/40805#issuecomment-1511025438
Pulling out join condition con't fix this issue because the [output partitioning](https://github.com/apache/spark/blob/72922adc8a78e8d31f03205a148b89291a9a4d19/sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputExpression.scala#L57) is `UnknownPartitioning(Bucket Number)` which can't satisfy the `SortMergeJoin`, and also needs to introduce shuffle. ```sql SELECT * FROM (SELECT Cast(i AS DECIMAL(20, 0)) AS i FROM t2) tmp1 JOIN (SELECT Cast(i AS DECIMAL(20, 0)) AS i FROM t3) tmp2 ON tmp1.i = tmp2.i; ``` ``` == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- SortMergeJoin [i#23], [i#24], Inner :- Sort [i#23 ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(i#23, 200), ENSURE_REQUIREMENTS, [plan_id=130] : +- Project [cast(i#19L as decimal(20,0)) AS i#23] : +- Filter isnotnull(cast(i#19L as decimal(20,0))) : +- FileScan parquet spark_catalog.default.t2[i#19L] Batched: true, Bucketed: false (disabled by query planner) +- Sort [i#24 ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(i#24, 200), ENSURE_REQUIREMENTS, [plan_id=135] +- Project [cast(i#20 as decimal(20,0)) AS i#24] +- Filter isnotnull(cast(i#20 as decimal(20,0))) +- FileScan parquet spark_catalog.default.t3[i#20] Batched: true, Bucketed: false (disabled by query planner) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
