sujith71955 opened a new pull request #24036: [SPARK-27036][SPARK-SQL] Cancel the running jobs in the background if broadcast future timeout error occurs URL: https://github.com/apache/spark/pull/24036 ## What changes were proposed in this pull request? Currently even Broadcast thread is timed out, Jobs are not aborted and it will run in the background. As per current design the broadcast future will be waiting till the timeout for the job result, which needs to be broadcasted , when the broadcast future timeout happens the job tasks running in the background will not getting killed and it will continue running in background. As part of solution we shall get the jobs based on execution id from app-status store and cancel the respective job before throwing out the Future time out exception, this can help to terminate the job and its respective tasks promptly when Timeout-exception happens, this will also save the additional resources getting utilized even after timeout exception thrown from driver. After fix In Spark web UI the jobs are getting failed once timeout error occurs. ## How was this patch tested? Manually Before fix ```scala> spark.sqlContext.setConf("spark.sql.broadcastTimeout","2") scala> val df1 = spark.range(0,10000,1,10000).selectExpr("id%10000 as key1", "id as value1") df1: org.apache.spark.sql.DataFrame = [key1: bigint, value1: bigint] scala> val df2 = spark.range(0,10000,1,10000).selectExpr("id%10000 as key2", "id as value2") df2: org.apache.spark.sql.DataFrame = [key2: bigint, value2: bigint] scala> val inner = df1.join(df2,col("key1")===col("key2")).select(col("key1"),col("value2")).collect ``` Actual Result : Timeout exception thrown and still task will be running in background, in spark web ui also the task execution will be in progress and after execution the job status shown successful, please refer attachments for more details.  Web UI  After Fix: Once timeout occurs the job will be cancelled and even in UI the job status displayed as failed.  Web UI 
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
