maryannxue commented on a change in pull request #28250:
URL: https://github.com/apache/spark/pull/28250#discussion_r411430727
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
##########
@@ -187,8 +191,24 @@ case class BroadcastQueryStageExec(
throw new IllegalStateException("wrong plan for broadcast stage:\n " +
plan.treeString)
}
+ @transient private lazy val materializeWithTimeout = {
+ val broadcastFuture = broadcast.completionFuture
+ val timeout = SQLConf.get.broadcastTimeout
+ val promise = Promise[Any]()
+ val fail = BroadcastQueryStageExec.scheduledExecutor.schedule(new
Runnable() {
+ override def run(): Unit = {
+ promise.tryFailure(new SparkException(s"Could not execute broadcast in
$timeout secs. " +
+ s"You can increase the timeout for broadcasts via
${SQLConf.BROADCAST_TIMEOUT.key} or " +
+ s"disable broadcast join by setting
${SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key} to -1"))
Review comment:
This is done in the AQE mechanism already: after the timeout happens,
this will become a `StageFailure` event in the AQE event queue, which will
trigger a cleanup that calls the `cancel()` routine of each running query stage
(including the broadcast stage that has timed out). And a broadcast stage's
`cancel()` stops the broadcast thread as well as the job group.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]