skonto commented on a change in pull request #24796: [SPARK-27900][CORE] Add 
uncaught exception handler to the driver
URL: https://github.com/apache/spark/pull/24796#discussion_r290520361
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
 ##########
 @@ -204,6 +204,11 @@ private [util] class SparkShutdownHookManager {
     hooks.synchronized { hooks.remove(ref) }
   }
 
+  def clear(): Unit = {
 
 Review comment:
   The scenario is described in the jira. I see a deadlock. The DAG event loop 
thread dies because it tries to schedule a big number of tasks it gets an OOM 
and gets unresponsive.
   Then the shutdownhook blocks without the DAG scheduler being able to finish. 
When the scheduler stop logic waits here 
https://github.com/apache/spark/blob/ecfdffcb3560e21ccd318de6a0c614fa0c3aabf5/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L81
   
   In general I am not confident we can predict the status of the jvm in case 
of an oom and what cna go wrong in order to do a safe shutdown. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to