skonto commented on a change in pull request #24796: [SPARK-27900][CORE] Add
uncaught exception handler to the driver
URL: https://github.com/apache/spark/pull/24796#discussion_r290532870
##########
File path: core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
##########
@@ -204,6 +204,11 @@ private [util] class SparkShutdownHookManager {
hooks.synchronized { hooks.remove(ref) }
}
+ def clear(): Unit = {
Review comment:
@srowen here it is:
https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b
I commented out the clear call. One other (indepedent) thing I noticed is
that the main thread is also stuck here:
https://github.com/apache/spark/blob/bfb3ffe9b33a403a1f3b6f5407d34a477ce62c85/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L736
Blocking for ever there might be a problem if something goes wrong.
Now if you check the output:
```
"Thread-1" #10 prio=5 os_prio=0 tid=0x000055d323902000 nid=0x7c in
Object.wait() [0x00007fdccd08a000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
- locked <0x00000000ebe00e50> (a
org.apache.spark.util.EventLoop$$anon$1)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.spark.util.EventLoop.stop(EventLoop.scala:81)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2100)
```
this will never finish and it waits at the join for
`dag-scheduler-event-loop`:
"dag-scheduler-event-loop" #45 daemon prio=5 os_prio=0
tid=0x000055d323a25000 nid=0x48 in Object.wait() [0x00007fdccd6d2000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
- locked <0x00000000eb4f3b58> (a
org.apache.hadoop.util.ShutdownHookManager$1)
at java.lang.Thread.join(Thread.java:1326)
at
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:107)
at
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
at java.lang.Shutdown.runHooks(Shutdown.java:123)
at java.lang.Shutdown.sequence(Shutdown.java:167)
at java.lang.Shutdown.exit(Shutdown.java:212)
- locked <0x00000000eb3848b8> (a java.lang.Class for java.lang.Shutdown)
at java.lang.Runtime.exit(Runtime.java:109)
at java.lang.System.exit(System.java:971)
at
org.apache.spark.util.SparkUncaughtExceptionHandler.sysExit(SparkUncaughtExceptionHandler.scala:35)
at
org.apache.spark.util.SparkUncaughtExceptionHandler.uncaughtException(SparkUncaughtExceptionHandler.scala:53)
at java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:1057)
at java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:1052)
at java.lang.Thread.dispatchUncaughtException(Thread.java:1959)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]