sujith71955 opened a new pull request #24036: [SPARK-27036][SPARK-SQL] Cancel 
the running jobs in the background if broadcast  future timeout error occurs
URL: https://github.com/apache/spark/pull/24036
 
 
   ## What changes were proposed in this pull request?
   Currently even Broadcast thread is timed out, Jobs are not aborted and it 
will run in the background.
    As per current design the broadcast future will be waiting till the timeout 
for the job result, which needs to be broadcasted , when the broadcast future 
timeout happens the
   job tasks running in the background will not getting killed and it will 
continue running in background.
   
   As part of solution we shall get the jobs based on execution id from 
app-status store and cancel the respective job before throwing out the Future 
time out exception,
   this can help to terminate the job and its respective tasks promptly when 
Timeout-exception happens, this will also save the additional resources getting 
utilized even after timeout exception thrown from driver.
   
   After fix In Spark web UI  the jobs are getting failed once timeout error 
occurs.
   
   ## How was this patch tested?
   
   Manually
   Before fix
   ```scala> spark.sqlContext.setConf("spark.sql.broadcastTimeout","2")
   scala> val df1 = spark.range(0,10000,1,10000).selectExpr("id%10000 as key1", 
"id as value1")
   df1: org.apache.spark.sql.DataFrame = [key1: bigint, value1: bigint]
   
   scala> val df2 = spark.range(0,10000,1,10000).selectExpr("id%10000 as key2", 
"id as value2")
   df2: org.apache.spark.sql.DataFrame = [key2: bigint, value2: bigint]
   
   scala> val inner = 
df1.join(df2,col("key1")===col("key2")).select(col("key1"),col("value2")).collect
 
   ```
   Actual Result : Timeout exception thrown and still task will be running in 
background, in spark web ui also the task execution will be in progress and 
after execution the job  status shown successful, please refer attachments for 
more details.
   
![image](https://user-images.githubusercontent.com/12999161/54067638-53b65100-4268-11e9-97dc-66c1a81e308e.png)
    Web UI
   
![broadcast_fished](https://user-images.githubusercontent.com/12999161/54067669-9aa44680-4268-11e9-800c-a70077fcd2e5.PNG)
   
   After Fix:
   Once timeout occurs the job will be cancelled and even in UI the job status 
displayed as failed.
   
![brcast_fix](https://user-images.githubusercontent.com/12999161/54067676-b7d91500-4268-11e9-8b65-0a9a48006327.PNG)
   
   Web UI
   
![brdast_fix2](https://user-images.githubusercontent.com/12999161/54067679-c293aa00-4268-11e9-8894-e0f663bd3039.PNG)
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to