yadavay-amzn opened a new pull request, #56243:
URL: https://github.com/apache/spark/pull/56243

   ### What changes were proposed in this pull request?
   
   Add join-awareness to `CoalesceShufflePartitions`. When a coalesce group 
feeds a `SortMergeJoinExec` or `ShuffledHashJoinExec`, enforce a minimum 
partition count floor to prevent eliminating join parallelism.
   
   The floor is computed as:
   - If `spark.sql.adaptive.coalescePartitions.minPartitionNum` is explicitly 
set, use that value
   - Otherwise, use a data-aware heuristic: `max(2, ceil(totalGroupSize / 
advisoryTargetSize))`
   
   This prevents coalescing join stages to 1 partition (the bug) while still 
allowing reasonable coalescing for small data.
   
   ### Why are the changes needed?
   
   `CoalesceShufflePartitions` can coalesce shuffle partitions on join stages 
down to 1, concentrating the entire shuffle dataset into a single reducer task. 
This happens after `OptimizeSkewedJoin` has already determined no skew exists — 
a determination that becomes invalid once coalescing destroys the partition 
layout.
   
   Production evidence shows top straggler jobs (1000-3000x task duration vs 
stage average) all had `num_output_partitions = 1` after coalescing, with data 
concentration ratios up to 552,948x.
   
   The issue manifests when `COALESCE_PARTITIONS_PARALLELISM_FIRST = false` and 
no explicit `COALESCE_PARTITIONS_MIN_PARTITION_NUM` is set, causing 
`minNumPartitions = 1`.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes — join stages will no longer be coalesced to unreasonably few 
partitions. Non-join stages (aggregations, repartitions) are unaffected and can 
still coalesce freely.
   
   ### How was this patch tested?
   
   Added 8 tests to `CoalesceShufflePartitionsSuite`:
   - SortMergeJoin respects partition floor
   - ShuffledHashJoin respects partition floor
   - Explicit `minPartitionNum` config used as floor for joins
   - Non-join stages can still coalesce to 1 (no regression)
   - `parallelismFirst=true` still works correctly
   - BroadcastHashJoin is not affected
   - Multi-way join (A⋈B⋈C) respects floor for all stages
   - Asymmetric data sizes still respect floor
   
   All 27 tests pass (20 existing + 7 new).
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to