[PR] [SPARK-55289][SQL][FOLLOWUP] Fix flaky test in-order-by.sql by disabling broadcast join [spark]

via GitHub Fri, 06 Mar 2026 10:32:35 -0800


yaooqinn opened a new pull request, #54663:
URL: https://github.com/apache/spark/pull/54663


   ### What changes were proposed in this pull request?
   
   Same fix as #54072 (SPARK-55289) for `in-set-operations.sql`. Adds `--SET 
spark.sql.autoBroadcastJoinThreshold=-1` to `in-order-by.sql` to prevent OOM 
from BroadcastHashJoin accumulating hash tables on memory-constrained CI 
runners.
   
   ### Why are the changes needed?
   
   `in-order-by.sql` intermittently fails on CI with `SparkOutOfMemoryError` 
for the same root cause as `in-set-operations.sql` — complex correlated 
IN-subqueries with multiple BroadcastHashJoin operations exceeding JVM heap 
under memory pressure.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. Test-only change.
   
   ### How was this patch tested?
   
   Golden files regenerated. Minor row reordering for rows with identical sort 
keys (expected when switching from BroadcastHashJoin to SortMergeJoin).
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes, co-authored with GitHub Copilot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-55289][SQL][FOLLOWUP] Fix flaky test in-order-by.sql by disabling broadcast join [spark]

Reply via email to