[jira] [Commented] (SPARK-30488) Deadlock between block-manager-slave-async-thread-pool and spark context cleaner
[ https://issues.apache.org/jira/browse/SPARK-30488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021692#comment-17021692 ] Hyukjin Kwon commented on SPARK-30488: -- Is that the only place to create? Can you show full reproducer and codes if possible? Otherwise seems it's impossible to investigate further or reproduce. > Deadlock between block-manager-slave-async-thread-pool and spark context > cleaner > > > Key: SPARK-30488 > URL: https://issues.apache.org/jira/browse/SPARK-30488 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Rohit Agrawal >Priority: Major > > Deadlock happens while cleaning up the spark context. Here is the full thread > dump: > > > 2020-01-10T20:13:16.2884057Z Full thread dump Java HotSpot(TM) 64-Bit > Server VM (25.221-b11 mixed mode): > 2020-01-10T20:13:16.2884392Z > 2020-01-10T20:13:16.2884660Z "SIGINT handler" #488 daemon prio=9 os_prio=2 > tid=0x111fa000 nid=0x4794 waiting for monitor entry > [0x1c86e000] > 2020-01-10T20:13:16.2884807Z java.lang.Thread.State: BLOCKED (on object > monitor) > 2020-01-10T20:13:16.2884879Z at java.lang.Shutdown.exit(Shutdown.java:212) > 2020-01-10T20:13:16.2885693Z - waiting to lock <0xc0155de0> (a > java.lang.Class for java.lang.Shutdown) > 2020-01-10T20:13:16.2885840Z at > java.lang.Terminator$1.handle(Terminator.java:52) > 2020-01-10T20:13:16.2885965Z at sun.misc.Signal$1.run(Signal.java:212) > 2020-01-10T20:13:16.2886329Z at java.lang.Thread.run(Thread.java:748) > 2020-01-10T20:13:16.2886430Z > 2020-01-10T20:13:16.2886752Z "Thread-3" #108 prio=5 os_prio=0 > tid=0x111f7800 nid=0x48cc waiting for monitor entry > [0x2c33f000] > 2020-01-10T20:13:16.2886881Z java.lang.Thread.State: BLOCKED (on object > monitor) > 2020-01-10T20:13:16.2886999Z at > org.apache.hadoop.util.ShutdownHookManager.getShutdownHooksInOrder(ShutdownHookManager.java:273) > 2020-01-10T20:13:16.2887107Z at > org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:121) > 2020-01-10T20:13:16.2887212Z at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95) > 2020-01-10T20:13:16.2887421Z > 2020-01-10T20:13:16.2887798Z "block-manager-slave-async-thread-pool-81" #486 > daemon prio=5 os_prio=0 tid=0x111fe800 nid=0x2e34 waiting for monitor > entry [0x2bf3d000] > 2020-01-10T20:13:16.2889192Z java.lang.Thread.State: BLOCKED (on object > monitor) > 2020-01-10T20:13:16.2889305Z at > java.lang.ClassLoader.loadClass(ClassLoader.java:404) > 2020-01-10T20:13:16.2889405Z - waiting to lock <0xc1f359f0> (a > sbt.internal.LayeredClassLoader) > 2020-01-10T20:13:16.2889482Z at > java.lang.ClassLoader.loadClass(ClassLoader.java:411) > 2020-01-10T20:13:16.2889582Z - locked <0xca33e4c8> (a > sbt.internal.ManagedClassLoader$ZombieClassLoader) > 2020-01-10T20:13:16.2889659Z at > java.lang.ClassLoader.loadClass(ClassLoader.java:357) > 2020-01-10T20:13:16.2890881Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply$mcZ$sp(BlockManagerSlaveEndpoint.scala:58) > 2020-01-10T20:13:16.2891006Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57) > 2020-01-10T20:13:16.2891142Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57) > 2020-01-10T20:13:16.2891260Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:86) > 2020-01-10T20:13:16.2891375Z at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > 2020-01-10T20:13:16.2891624Z at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > 2020-01-10T20:13:16.2891737Z at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 2020-01-10T20:13:16.2891833Z at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 2020-01-10T20:13:16.2891925Z at java.lang.Thread.run(Thread.java:748) > 2020-01-10T20:13:16.2891967Z > 2020-01-10T20:13:16.2892066Z "pool-31-thread-16" #335 prio=5 os_prio=0 > tid=0x153b2000 nid=0x1aac waiting on condition [0x4b2ff000] > 2020-01-10T20:13:16.2892147Z java.lang.Thread.State: WAITING (parking) > 2020-01-10T20:13:16.2892241Z at sun.misc.Unsafe.park(Native Method) > 2020-01-10T20:13:16.2892328Z - parking to wait for <0xc9cad078> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 2020-01-10T20:13:16.2892437Z at >
[jira] [Commented] (SPARK-30488) Deadlock between block-manager-slave-async-thread-pool and spark context cleaner
[ https://issues.apache.org/jira/browse/SPARK-30488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015308#comment-17015308 ] Rohit Agrawal commented on SPARK-30488: --- [~ajithshetty] We use the following to create spark context: SparkSession.builder().config(finalSparkConf).getOrCreate() > Deadlock between block-manager-slave-async-thread-pool and spark context > cleaner > > > Key: SPARK-30488 > URL: https://issues.apache.org/jira/browse/SPARK-30488 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Rohit Agrawal >Priority: Major > > Deadlock happens while cleaning up the spark context. Here is the full thread > dump: > > > 2020-01-10T20:13:16.2884057Z Full thread dump Java HotSpot(TM) 64-Bit > Server VM (25.221-b11 mixed mode): > 2020-01-10T20:13:16.2884392Z > 2020-01-10T20:13:16.2884660Z "SIGINT handler" #488 daemon prio=9 os_prio=2 > tid=0x111fa000 nid=0x4794 waiting for monitor entry > [0x1c86e000] > 2020-01-10T20:13:16.2884807Z java.lang.Thread.State: BLOCKED (on object > monitor) > 2020-01-10T20:13:16.2884879Z at java.lang.Shutdown.exit(Shutdown.java:212) > 2020-01-10T20:13:16.2885693Z - waiting to lock <0xc0155de0> (a > java.lang.Class for java.lang.Shutdown) > 2020-01-10T20:13:16.2885840Z at > java.lang.Terminator$1.handle(Terminator.java:52) > 2020-01-10T20:13:16.2885965Z at sun.misc.Signal$1.run(Signal.java:212) > 2020-01-10T20:13:16.2886329Z at java.lang.Thread.run(Thread.java:748) > 2020-01-10T20:13:16.2886430Z > 2020-01-10T20:13:16.2886752Z "Thread-3" #108 prio=5 os_prio=0 > tid=0x111f7800 nid=0x48cc waiting for monitor entry > [0x2c33f000] > 2020-01-10T20:13:16.2886881Z java.lang.Thread.State: BLOCKED (on object > monitor) > 2020-01-10T20:13:16.2886999Z at > org.apache.hadoop.util.ShutdownHookManager.getShutdownHooksInOrder(ShutdownHookManager.java:273) > 2020-01-10T20:13:16.2887107Z at > org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:121) > 2020-01-10T20:13:16.2887212Z at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95) > 2020-01-10T20:13:16.2887421Z > 2020-01-10T20:13:16.2887798Z "block-manager-slave-async-thread-pool-81" #486 > daemon prio=5 os_prio=0 tid=0x111fe800 nid=0x2e34 waiting for monitor > entry [0x2bf3d000] > 2020-01-10T20:13:16.2889192Z java.lang.Thread.State: BLOCKED (on object > monitor) > 2020-01-10T20:13:16.2889305Z at > java.lang.ClassLoader.loadClass(ClassLoader.java:404) > 2020-01-10T20:13:16.2889405Z - waiting to lock <0xc1f359f0> (a > sbt.internal.LayeredClassLoader) > 2020-01-10T20:13:16.2889482Z at > java.lang.ClassLoader.loadClass(ClassLoader.java:411) > 2020-01-10T20:13:16.2889582Z - locked <0xca33e4c8> (a > sbt.internal.ManagedClassLoader$ZombieClassLoader) > 2020-01-10T20:13:16.2889659Z at > java.lang.ClassLoader.loadClass(ClassLoader.java:357) > 2020-01-10T20:13:16.2890881Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply$mcZ$sp(BlockManagerSlaveEndpoint.scala:58) > 2020-01-10T20:13:16.2891006Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57) > 2020-01-10T20:13:16.2891142Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57) > 2020-01-10T20:13:16.2891260Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:86) > 2020-01-10T20:13:16.2891375Z at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > 2020-01-10T20:13:16.2891624Z at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > 2020-01-10T20:13:16.2891737Z at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 2020-01-10T20:13:16.2891833Z at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 2020-01-10T20:13:16.2891925Z at java.lang.Thread.run(Thread.java:748) > 2020-01-10T20:13:16.2891967Z > 2020-01-10T20:13:16.2892066Z "pool-31-thread-16" #335 prio=5 os_prio=0 > tid=0x153b2000 nid=0x1aac waiting on condition [0x4b2ff000] > 2020-01-10T20:13:16.2892147Z java.lang.Thread.State: WAITING (parking) > 2020-01-10T20:13:16.2892241Z at sun.misc.Unsafe.park(Native Method) > 2020-01-10T20:13:16.2892328Z - parking to wait for <0xc9cad078> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > 2020-01-10T20:13:16.2892437Z at >
[jira] [Commented] (SPARK-30488) Deadlock between block-manager-slave-async-thread-pool and spark context cleaner
[ https://issues.apache.org/jira/browse/SPARK-30488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013651#comment-17013651 ] Ajith S commented on SPARK-30488: - As from log, i see `sbt` classes in the deadlock threads,this is related to internal classloaders in sbt which was fixed in sbt 1.3.3 by marking classloaders as parallel capable. [https://github.com/sbt/sbt/pull/5131] also a similar issue '[https://github.com/sbt/sbt/issues/5116]' [~rohit21agrawal] Thanks for reporting this. Some questions, can you also please mention how the sparkcontext was created.? > Deadlock between block-manager-slave-async-thread-pool and spark context > cleaner > > > Key: SPARK-30488 > URL: https://issues.apache.org/jira/browse/SPARK-30488 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Rohit Agrawal >Priority: Major > > Deadlock happens while cleaning up the spark context. Here is the full thread > dump: > > > 2020-01-10T20:13:16.2884057Z Full thread dump Java HotSpot(TM) 64-Bit > Server VM (25.221-b11 mixed mode): > 2020-01-10T20:13:16.2884392Z > 2020-01-10T20:13:16.2884660Z "SIGINT handler" #488 daemon prio=9 os_prio=2 > tid=0x111fa000 nid=0x4794 waiting for monitor entry > [0x1c86e000] > 2020-01-10T20:13:16.2884807Z java.lang.Thread.State: BLOCKED (on object > monitor) > 2020-01-10T20:13:16.2884879Z at java.lang.Shutdown.exit(Shutdown.java:212) > 2020-01-10T20:13:16.2885693Z - waiting to lock <0xc0155de0> (a > java.lang.Class for java.lang.Shutdown) > 2020-01-10T20:13:16.2885840Z at > java.lang.Terminator$1.handle(Terminator.java:52) > 2020-01-10T20:13:16.2885965Z at sun.misc.Signal$1.run(Signal.java:212) > 2020-01-10T20:13:16.2886329Z at java.lang.Thread.run(Thread.java:748) > 2020-01-10T20:13:16.2886430Z > 2020-01-10T20:13:16.2886752Z "Thread-3" #108 prio=5 os_prio=0 > tid=0x111f7800 nid=0x48cc waiting for monitor entry > [0x2c33f000] > 2020-01-10T20:13:16.2886881Z java.lang.Thread.State: BLOCKED (on object > monitor) > 2020-01-10T20:13:16.2886999Z at > org.apache.hadoop.util.ShutdownHookManager.getShutdownHooksInOrder(ShutdownHookManager.java:273) > 2020-01-10T20:13:16.2887107Z at > org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:121) > 2020-01-10T20:13:16.2887212Z at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95) > 2020-01-10T20:13:16.2887421Z > 2020-01-10T20:13:16.2887798Z "block-manager-slave-async-thread-pool-81" #486 > daemon prio=5 os_prio=0 tid=0x111fe800 nid=0x2e34 waiting for monitor > entry [0x2bf3d000] > 2020-01-10T20:13:16.2889192Z java.lang.Thread.State: BLOCKED (on object > monitor) > 2020-01-10T20:13:16.2889305Z at > java.lang.ClassLoader.loadClass(ClassLoader.java:404) > 2020-01-10T20:13:16.2889405Z - waiting to lock <0xc1f359f0> (a > sbt.internal.LayeredClassLoader) > 2020-01-10T20:13:16.2889482Z at > java.lang.ClassLoader.loadClass(ClassLoader.java:411) > 2020-01-10T20:13:16.2889582Z - locked <0xca33e4c8> (a > sbt.internal.ManagedClassLoader$ZombieClassLoader) > 2020-01-10T20:13:16.2889659Z at > java.lang.ClassLoader.loadClass(ClassLoader.java:357) > 2020-01-10T20:13:16.2890881Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply$mcZ$sp(BlockManagerSlaveEndpoint.scala:58) > 2020-01-10T20:13:16.2891006Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57) > 2020-01-10T20:13:16.2891142Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57) > 2020-01-10T20:13:16.2891260Z at > org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:86) > 2020-01-10T20:13:16.2891375Z at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > 2020-01-10T20:13:16.2891624Z at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > 2020-01-10T20:13:16.2891737Z at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 2020-01-10T20:13:16.2891833Z at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 2020-01-10T20:13:16.2891925Z at java.lang.Thread.run(Thread.java:748) > 2020-01-10T20:13:16.2891967Z > 2020-01-10T20:13:16.2892066Z "pool-31-thread-16" #335 prio=5 os_prio=0 > tid=0x153b2000 nid=0x1aac waiting on condition [0x4b2ff000] > 2020-01-10T20:13:16.2892147Z java.lang.Thread.State: WAITING (parking) >