[GitHub] [spark] weixiuli opened a new pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

GitBox Mon, 07 Feb 2022 03:32:25 -0800


weixiuli opened a new pull request #35425:
URL: https://github.com/apache/spark/pull/35425



   
   ### What changes were proposed in this pull request?
   In the [SPARK-36414](https://issues.apache.org/jira/browse/SPARK-36414), it 
disabled timeout for all BroadcastQueryStageExecs in AQE,  there may be 
regression in AQE when a BroadcastQueryStageExec doesn't come from shuffle 
query stages, such as a big table misjudged a small one in 
broadcastHashJoin，and broadcasting the table may take a long time which is no 
better than SortMergerJoin(when the job timeout and rerun it with 
spark.sql.autoBroadcastJoinThreshold=-1), and the timeout is necessary for 
BroadcastQueryStageExec in this case.
   
   So, we should disable timeout for BroadcastQueryStageExec when it comes from 
shuffle query stages which runtime statistics are usually correct in AQE, but 
should enable timeout for it when it comes from others which statistics may be 
incorrect, and keep it the same as non-AQE.
   
   ### Why are the changes needed?
   
   Avoid regression in AQE 
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Add unittests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] weixiuli opened a new pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

Reply via email to