skonto commented on a change in pull request #24796: [SPARK-27900][CORE] Add 
uncaught exception handler to the driver
URL: https://github.com/apache/spark/pull/24796#discussion_r290520361
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
 ##########
 @@ -204,6 +204,11 @@ private [util] class SparkShutdownHookManager {
     hooks.synchronized { hooks.remove(ref) }
   }
 
+  def clear(): Unit = {
 
 Review comment:
   The scenario is described in the jira. I see a deadlock. The DAG event loop 
thread dies because it tries to schedule a big number of tasks it gets an OOM 
and gets unresponsive (never wakes from the interrupt). I can point to the full 
jstack output if we need to dig further into this.
   Then the shutdownhook blocks without the DAG scheduler being able to finish. 
The scheduler stop logic waits here 
https://github.com/apache/spark/blob/ecfdffcb3560e21ccd318de6a0c614fa0c3aabf5/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L81
   
   In general I am not confident we can predict the status of the jvm in case 
of an oom and what can go wrong in order to do a safe shutdown, that is why 
exiting is one good option (maybe not the only one). 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to