sunchao opened a new pull request, #56348:
URL: https://github.com/apache/spark/pull/56348

   ### What changes were proposed in this pull request?
   
   Extend `spark.sql.shuffle.spreadNullJoinKeys.enabled` to shuffled `LEFT ANTI`
   equi-joins when the preserved left-side join keys are nullable.
   
   The planner requests the existing null-aware clustered distribution for 
eligible
   left anti joins. Non-NULL keys retain normal hash placement, while NULL keys 
may
   be spread across shuffle partitions. This PR also updates the configuration
   documentation.
   
   The tests cover sort-merge and shuffled-hash left anti joins, including 
result
   correctness and null-aware shuffle partitioning, plus AQE coalescing of the
   resulting partitioning.
   
   This follows the `LEFT ANTI` discussion in
   https://github.com/apache/spark/pull/55927.
   
   ### Why are the changes needed?
   
   For an ordinary `LEFT ANTI` equi-join, rows with NULL keys on the preserved 
left
   side cannot match and must be emitted. Standard hash partitioning sends all 
of
   those rows to the same reducer, which can create severe shuffle skew.
   
   Spreading the NULL-keyed rows only changes their physical placement and
   therefore reduces this skew without changing join results.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, but only when `spark.sql.shuffle.spreadNullJoinKeys.enabled` is enabled.
   Eligible shuffled left anti joins may spread NULL-keyed preserved rows across
   shuffle partitions. Query results are unchanged, and the configuration 
remains
   disabled by default.
   
   ### How was this patch tested?
   
   - `JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 
./build/sbt "sql/testOnly 
org.apache.spark.sql.execution.joins.ExistenceJoinSuite"` (118 tests passed)
   - `JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 
./build/sbt "sql/testOnly 
org.apache.spark.sql.execution.adaptive.AdaptiveQueryExecSuite -- -z 
'SPARK-57282: spread NULL keys for left anti join'"` (1 test passed)
   - `JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 
./dev/lint-scala`
   - `git diff --check`
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Codex GPT-5
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to