[jira] [Commented] (SPARK-30488) Deadlock between block-manager-slave-async-thread-pool and spark context cleaner

2020-01-22 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021692#comment-17021692
 ] 

Hyukjin Kwon commented on SPARK-30488:
--

Is that the only place to create? Can you show full reproducer and codes if 
possible? Otherwise seems it's impossible to investigate further or reproduce.

> Deadlock between block-manager-slave-async-thread-pool and spark context 
> cleaner
> 
>
> Key: SPARK-30488
> URL: https://issues.apache.org/jira/browse/SPARK-30488
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Rohit Agrawal
>Priority: Major
>
> Deadlock happens while cleaning up the spark context. Here is the full thread 
> dump:
>  
>   
>   2020-01-10T20:13:16.2884057Z Full thread dump Java HotSpot(TM) 64-Bit 
> Server VM (25.221-b11 mixed mode):
> 2020-01-10T20:13:16.2884392Z 
> 2020-01-10T20:13:16.2884660Z "SIGINT handler" #488 daemon prio=9 os_prio=2 
> tid=0x111fa000 nid=0x4794 waiting for monitor entry 
> [0x1c86e000]
> 2020-01-10T20:13:16.2884807Z java.lang.Thread.State: BLOCKED (on object 
> monitor)
> 2020-01-10T20:13:16.2884879Z at java.lang.Shutdown.exit(Shutdown.java:212)
> 2020-01-10T20:13:16.2885693Z - waiting to lock <0xc0155de0> (a 
> java.lang.Class for java.lang.Shutdown)
> 2020-01-10T20:13:16.2885840Z at 
> java.lang.Terminator$1.handle(Terminator.java:52)
> 2020-01-10T20:13:16.2885965Z at sun.misc.Signal$1.run(Signal.java:212)
> 2020-01-10T20:13:16.2886329Z at java.lang.Thread.run(Thread.java:748)
> 2020-01-10T20:13:16.2886430Z 
> 2020-01-10T20:13:16.2886752Z "Thread-3" #108 prio=5 os_prio=0 
> tid=0x111f7800 nid=0x48cc waiting for monitor entry 
> [0x2c33f000]
> 2020-01-10T20:13:16.2886881Z java.lang.Thread.State: BLOCKED (on object 
> monitor)
> 2020-01-10T20:13:16.2886999Z at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownHooksInOrder(ShutdownHookManager.java:273)
> 2020-01-10T20:13:16.2887107Z at 
> org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:121)
> 2020-01-10T20:13:16.2887212Z at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
> 2020-01-10T20:13:16.2887421Z 
> 2020-01-10T20:13:16.2887798Z "block-manager-slave-async-thread-pool-81" #486 
> daemon prio=5 os_prio=0 tid=0x111fe800 nid=0x2e34 waiting for monitor 
> entry [0x2bf3d000]
> 2020-01-10T20:13:16.2889192Z java.lang.Thread.State: BLOCKED (on object 
> monitor)
> 2020-01-10T20:13:16.2889305Z at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:404)
> 2020-01-10T20:13:16.2889405Z - waiting to lock <0xc1f359f0> (a 
> sbt.internal.LayeredClassLoader)
> 2020-01-10T20:13:16.2889482Z at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:411)
> 2020-01-10T20:13:16.2889582Z - locked <0xca33e4c8> (a 
> sbt.internal.ManagedClassLoader$ZombieClassLoader)
> 2020-01-10T20:13:16.2889659Z at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> 2020-01-10T20:13:16.2890881Z at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply$mcZ$sp(BlockManagerSlaveEndpoint.scala:58)
> 2020-01-10T20:13:16.2891006Z at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57)
> 2020-01-10T20:13:16.2891142Z at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57)
> 2020-01-10T20:13:16.2891260Z at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:86)
> 2020-01-10T20:13:16.2891375Z at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> 2020-01-10T20:13:16.2891624Z at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> 2020-01-10T20:13:16.2891737Z at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 2020-01-10T20:13:16.2891833Z at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 2020-01-10T20:13:16.2891925Z at java.lang.Thread.run(Thread.java:748)
> 2020-01-10T20:13:16.2891967Z 
> 2020-01-10T20:13:16.2892066Z "pool-31-thread-16" #335 prio=5 os_prio=0 
> tid=0x153b2000 nid=0x1aac waiting on condition [0x4b2ff000]
> 2020-01-10T20:13:16.2892147Z java.lang.Thread.State: WAITING (parking)
> 2020-01-10T20:13:16.2892241Z at sun.misc.Unsafe.park(Native Method)
> 2020-01-10T20:13:16.2892328Z - parking to wait for <0xc9cad078> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 2020-01-10T20:13:16.2892437Z at 
> 

[jira] [Commented] (SPARK-30488) Deadlock between block-manager-slave-async-thread-pool and spark context cleaner

2020-01-14 Thread Rohit Agrawal (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015308#comment-17015308
 ] 

Rohit Agrawal commented on SPARK-30488:
---

[~ajithshetty]

We use the following to create spark context:

 

SparkSession.builder().config(finalSparkConf).getOrCreate()

> Deadlock between block-manager-slave-async-thread-pool and spark context 
> cleaner
> 
>
> Key: SPARK-30488
> URL: https://issues.apache.org/jira/browse/SPARK-30488
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Rohit Agrawal
>Priority: Major
>
> Deadlock happens while cleaning up the spark context. Here is the full thread 
> dump:
>  
>   
>   2020-01-10T20:13:16.2884057Z Full thread dump Java HotSpot(TM) 64-Bit 
> Server VM (25.221-b11 mixed mode):
> 2020-01-10T20:13:16.2884392Z 
> 2020-01-10T20:13:16.2884660Z "SIGINT handler" #488 daemon prio=9 os_prio=2 
> tid=0x111fa000 nid=0x4794 waiting for monitor entry 
> [0x1c86e000]
> 2020-01-10T20:13:16.2884807Z java.lang.Thread.State: BLOCKED (on object 
> monitor)
> 2020-01-10T20:13:16.2884879Z at java.lang.Shutdown.exit(Shutdown.java:212)
> 2020-01-10T20:13:16.2885693Z - waiting to lock <0xc0155de0> (a 
> java.lang.Class for java.lang.Shutdown)
> 2020-01-10T20:13:16.2885840Z at 
> java.lang.Terminator$1.handle(Terminator.java:52)
> 2020-01-10T20:13:16.2885965Z at sun.misc.Signal$1.run(Signal.java:212)
> 2020-01-10T20:13:16.2886329Z at java.lang.Thread.run(Thread.java:748)
> 2020-01-10T20:13:16.2886430Z 
> 2020-01-10T20:13:16.2886752Z "Thread-3" #108 prio=5 os_prio=0 
> tid=0x111f7800 nid=0x48cc waiting for monitor entry 
> [0x2c33f000]
> 2020-01-10T20:13:16.2886881Z java.lang.Thread.State: BLOCKED (on object 
> monitor)
> 2020-01-10T20:13:16.2886999Z at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownHooksInOrder(ShutdownHookManager.java:273)
> 2020-01-10T20:13:16.2887107Z at 
> org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:121)
> 2020-01-10T20:13:16.2887212Z at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
> 2020-01-10T20:13:16.2887421Z 
> 2020-01-10T20:13:16.2887798Z "block-manager-slave-async-thread-pool-81" #486 
> daemon prio=5 os_prio=0 tid=0x111fe800 nid=0x2e34 waiting for monitor 
> entry [0x2bf3d000]
> 2020-01-10T20:13:16.2889192Z java.lang.Thread.State: BLOCKED (on object 
> monitor)
> 2020-01-10T20:13:16.2889305Z at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:404)
> 2020-01-10T20:13:16.2889405Z - waiting to lock <0xc1f359f0> (a 
> sbt.internal.LayeredClassLoader)
> 2020-01-10T20:13:16.2889482Z at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:411)
> 2020-01-10T20:13:16.2889582Z - locked <0xca33e4c8> (a 
> sbt.internal.ManagedClassLoader$ZombieClassLoader)
> 2020-01-10T20:13:16.2889659Z at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> 2020-01-10T20:13:16.2890881Z at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply$mcZ$sp(BlockManagerSlaveEndpoint.scala:58)
> 2020-01-10T20:13:16.2891006Z at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57)
> 2020-01-10T20:13:16.2891142Z at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57)
> 2020-01-10T20:13:16.2891260Z at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:86)
> 2020-01-10T20:13:16.2891375Z at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> 2020-01-10T20:13:16.2891624Z at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> 2020-01-10T20:13:16.2891737Z at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 2020-01-10T20:13:16.2891833Z at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 2020-01-10T20:13:16.2891925Z at java.lang.Thread.run(Thread.java:748)
> 2020-01-10T20:13:16.2891967Z 
> 2020-01-10T20:13:16.2892066Z "pool-31-thread-16" #335 prio=5 os_prio=0 
> tid=0x153b2000 nid=0x1aac waiting on condition [0x4b2ff000]
> 2020-01-10T20:13:16.2892147Z java.lang.Thread.State: WAITING (parking)
> 2020-01-10T20:13:16.2892241Z at sun.misc.Unsafe.park(Native Method)
> 2020-01-10T20:13:16.2892328Z - parking to wait for <0xc9cad078> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 2020-01-10T20:13:16.2892437Z at 
> 

[jira] [Commented] (SPARK-30488) Deadlock between block-manager-slave-async-thread-pool and spark context cleaner

2020-01-11 Thread Ajith S (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013651#comment-17013651
 ] 

Ajith S commented on SPARK-30488:
-

As from log, i see `sbt` classes  in the deadlock threads,this is related to 
internal classloaders in sbt which was fixed in sbt 1.3.3 by marking 
classloaders as parallel capable.  [https://github.com/sbt/sbt/pull/5131]  also 
a similar issue '[https://github.com/sbt/sbt/issues/5116]'

 

[~rohit21agrawal]  Thanks for reporting this. Some questions, can you also 
please mention how the sparkcontext was created.?

 

> Deadlock between block-manager-slave-async-thread-pool and spark context 
> cleaner
> 
>
> Key: SPARK-30488
> URL: https://issues.apache.org/jira/browse/SPARK-30488
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Rohit Agrawal
>Priority: Major
>
> Deadlock happens while cleaning up the spark context. Here is the full thread 
> dump:
>  
>   
>   2020-01-10T20:13:16.2884057Z Full thread dump Java HotSpot(TM) 64-Bit 
> Server VM (25.221-b11 mixed mode):
> 2020-01-10T20:13:16.2884392Z 
> 2020-01-10T20:13:16.2884660Z "SIGINT handler" #488 daemon prio=9 os_prio=2 
> tid=0x111fa000 nid=0x4794 waiting for monitor entry 
> [0x1c86e000]
> 2020-01-10T20:13:16.2884807Z java.lang.Thread.State: BLOCKED (on object 
> monitor)
> 2020-01-10T20:13:16.2884879Z at java.lang.Shutdown.exit(Shutdown.java:212)
> 2020-01-10T20:13:16.2885693Z - waiting to lock <0xc0155de0> (a 
> java.lang.Class for java.lang.Shutdown)
> 2020-01-10T20:13:16.2885840Z at 
> java.lang.Terminator$1.handle(Terminator.java:52)
> 2020-01-10T20:13:16.2885965Z at sun.misc.Signal$1.run(Signal.java:212)
> 2020-01-10T20:13:16.2886329Z at java.lang.Thread.run(Thread.java:748)
> 2020-01-10T20:13:16.2886430Z 
> 2020-01-10T20:13:16.2886752Z "Thread-3" #108 prio=5 os_prio=0 
> tid=0x111f7800 nid=0x48cc waiting for monitor entry 
> [0x2c33f000]
> 2020-01-10T20:13:16.2886881Z java.lang.Thread.State: BLOCKED (on object 
> monitor)
> 2020-01-10T20:13:16.2886999Z at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownHooksInOrder(ShutdownHookManager.java:273)
> 2020-01-10T20:13:16.2887107Z at 
> org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:121)
> 2020-01-10T20:13:16.2887212Z at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
> 2020-01-10T20:13:16.2887421Z 
> 2020-01-10T20:13:16.2887798Z "block-manager-slave-async-thread-pool-81" #486 
> daemon prio=5 os_prio=0 tid=0x111fe800 nid=0x2e34 waiting for monitor 
> entry [0x2bf3d000]
> 2020-01-10T20:13:16.2889192Z java.lang.Thread.State: BLOCKED (on object 
> monitor)
> 2020-01-10T20:13:16.2889305Z at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:404)
> 2020-01-10T20:13:16.2889405Z - waiting to lock <0xc1f359f0> (a 
> sbt.internal.LayeredClassLoader)
> 2020-01-10T20:13:16.2889482Z at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:411)
> 2020-01-10T20:13:16.2889582Z - locked <0xca33e4c8> (a 
> sbt.internal.ManagedClassLoader$ZombieClassLoader)
> 2020-01-10T20:13:16.2889659Z at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> 2020-01-10T20:13:16.2890881Z at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply$mcZ$sp(BlockManagerSlaveEndpoint.scala:58)
> 2020-01-10T20:13:16.2891006Z at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57)
> 2020-01-10T20:13:16.2891142Z at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57)
> 2020-01-10T20:13:16.2891260Z at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:86)
> 2020-01-10T20:13:16.2891375Z at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> 2020-01-10T20:13:16.2891624Z at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> 2020-01-10T20:13:16.2891737Z at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 2020-01-10T20:13:16.2891833Z at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 2020-01-10T20:13:16.2891925Z at java.lang.Thread.run(Thread.java:748)
> 2020-01-10T20:13:16.2891967Z 
> 2020-01-10T20:13:16.2892066Z "pool-31-thread-16" #335 prio=5 os_prio=0 
> tid=0x153b2000 nid=0x1aac waiting on condition [0x4b2ff000]
> 2020-01-10T20:13:16.2892147Z java.lang.Thread.State: WAITING (parking)
>