[GitHub] [spark] wangyum commented on pull request #40805: [SPARK-40609][SQL] Unwrap cast in the join condition to unlock bucketed read

via GitHub Mon, 17 Apr 2023 02:42:28 -0700


wangyum commented on PR #40805:
URL: https://github.com/apache/spark/pull/40805#issuecomment-1511025438


   Pulling out join condition con't fix this issue because the [output 
partitioning](https://github.com/apache/spark/blob/72922adc8a78e8d31f03205a148b89291a9a4d19/sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputExpression.scala#L57)
 is `UnknownPartitioning(Bucket Number)` which can't satisfy the 
`SortMergeJoin`, and also needs to introduce shuffle.
   
   ```sql
   SELECT *
   FROM   (SELECT Cast(i AS DECIMAL(20, 0)) AS i FROM t2) tmp1
   JOIN   (SELECT Cast(i AS DECIMAL(20, 0)) AS i FROM t3) tmp2
   ON     tmp1.i = tmp2.i; 
   ```
   
   ```
   == Physical Plan ==
   AdaptiveSparkPlan isFinalPlan=false
   +- SortMergeJoin [i#23], [i#24], Inner
      :- Sort [i#23 ASC NULLS FIRST], false, 0
      :  +- Exchange hashpartitioning(i#23, 200), ENSURE_REQUIREMENTS, 
[plan_id=130]
      :     +- Project [cast(i#19L as decimal(20,0)) AS i#23]
      :        +- Filter isnotnull(cast(i#19L as decimal(20,0)))
      :           +- FileScan parquet spark_catalog.default.t2[i#19L] Batched: 
true, Bucketed: false (disabled by query planner)
      +- Sort [i#24 ASC NULLS FIRST], false, 0
         +- Exchange hashpartitioning(i#24, 200), ENSURE_REQUIREMENTS, 
[plan_id=135]
            +- Project [cast(i#20 as decimal(20,0)) AS i#24]
               +- Filter isnotnull(cast(i#20 as decimal(20,0)))
                  +- FileScan parquet spark_catalog.default.t3[i#20] Batched: 
true, Bucketed: false (disabled by query planner)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wangyum commented on pull request #40805: [SPARK-40609][SQL] Unwrap cast in the join condition to unlock bucketed read

Reply via email to