weixiuli opened a new pull request #35425: URL: https://github.com/apache/spark/pull/35425
### What changes were proposed in this pull request? In the [SPARK-36414](https://issues.apache.org/jira/browse/SPARK-36414), it disabled timeout for all BroadcastQueryStageExecs in AQE, there may be regression in AQE when a BroadcastQueryStageExec doesn't come from shuffle query stages, such as a big table misjudged a small one in broadcastHashJoin,and broadcasting the table may take a long time which is no better than SortMergerJoin(when the job timeout and rerun it with spark.sql.autoBroadcastJoinThreshold=-1), and the timeout is necessary for BroadcastQueryStageExec in this case. So, we should disable timeout for BroadcastQueryStageExec when it comes from shuffle query stages which runtime statistics are usually correct in AQE, but should enable timeout for it when it comes from others which statistics may be incorrect, and keep it the same as non-AQE. ### Why are the changes needed? Avoid regression in AQE ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add unittests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
