pgandhi999 commented on issue #24035: [SPARK-27112] : Spark Scheduler 
encounters two independent Deadlocks …
URL: https://github.com/apache/spark/pull/24035#issuecomment-471633756
 
 
   @attilapiros Have attached the stack trace in text format here:
   
   Deadlock between task-result-getter-thread and 
spark-dynamic-executor-allocation thread:
   
   `=============================
   "task-result-getter-0":
     waiting to lock monitor 0x00007f35dcf25cb8 (object 0x00000004404f2518, a 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend),
     which is held by "spark-dynamic-executor-allocation"
   "spark-dynamic-executor-allocation":
     waiting to lock monitor 0x00007f35dc20f1f8 (object 0x00000004404f25c0, a 
org.apache.spark.scheduler.cluster.YarnClusterScheduler),
     which is held by "task-result-getter-0"
   
   
   Java stack information for the threads listed above:
   ===================================================
   "task-result-getter-0":
           at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.killExecutors(CoarseGrainedSchedulerBackend.scala:603)
           - waiting to lock <0x00000004404f2518> (a 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend)
           at 
org.apache.spark.scheduler.BlacklistTracker.org$apache$spark$scheduler$BlacklistTracker$$killBlacklistedExecutor(BlacklistTracker.scala:155)
           at 
org.apache.spark.scheduler.BlacklistTracker$$anonfun$updateBlacklistForSuccessfulTaskSet$1.apply(BlacklistTracker.scala:247)
           at 
org.apache.spark.scheduler.BlacklistTracker$$anonfun$updateBlacklistForSuccessfulTaskSet$1.apply(BlacklistTracker.scala:226)
           at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
           at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
           at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
           at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
           at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
           at 
org.apache.spark.scheduler.BlacklistTracker.updateBlacklistForSuccessfulTaskSet(BlacklistTracker.scala:226)
           at 
org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$maybeFinishTaskSet$1.apply(TaskSetManager.scala:530)
           at 
org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$maybeFinishTaskSet$1.apply(TaskSetManager.scala:530)
           at scala.Option.foreach(Option.scala:257)
           at 
org.apache.spark.scheduler.TaskSetManager.org$apache$spark$scheduler$TaskSetManager$$maybeFinishTaskSet(TaskSetManager.scala:530)
           at 
org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:787)
           at 
org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:466)
           - locked <0x00000004404f25c0> (a 
org.apache.spark.scheduler.cluster.YarnClusterScheduler)
           at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:113)
           at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
           at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
           at 
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2004)
           at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   
   
   "spark-dynamic-executor-allocation":
           at 
org.apache.spark.scheduler.TaskSchedulerImpl.isExecutorBusy(TaskSchedulerImpl.scala:647)
           - waiting to lock <0x00000004404f25c0> (a 
org.apache.spark.scheduler.cluster.YarnClusterScheduler)
           at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$$anonfun$9.apply(CoarseGrainedSchedulerBackend.scala:613)
           at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$$anonfun$9.apply(CoarseGrainedSchedulerBackend.scala:613)
           at 
scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
           at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
           at 
scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
           at 
scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
           at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
           at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.killExecutors(CoarseGrainedSchedulerBackend.scala:613)
          - locked <0x00000004404f2518> (a 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend)
           at 
org.apache.spark.ExecutorAllocationManager.removeExecutors(ExecutorAllocationManager.scala:481)
   
           - locked <0x00000004442fb590> (a 
org.apache.spark.ExecutorAllocationManager)
           at 
org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:321)
           - locked <0x00000004442fb590> (a 
org.apache.spark.ExecutorAllocationManager)
           at 
org.apache.spark.ExecutorAllocationManager$$anon$2.run(ExecutorAllocationManager.scala:246)
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)`
   
   Deadlock between task-result-getter-thread and dispatcher-event-loop thread:
   
   `Found one Java-level deadlock:
   =============================
   "task-result-getter-2":
     waiting to lock monitor 0x00007f9be88b2678 (object 0x00000003c0720ed0, a 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend),
     which is held by "dispatcher-event-loop-23"
   "dispatcher-event-loop-23":
     waiting to lock monitor 0x00007f9bf077abb8 (object 0x00000003c0720f78, a 
org.apache.spark.scheduler.cluster.YarnClusterScheduler),
     which is held by "task-result-getter-2"
   
   Java stack information for the threads listed above:
   ===================================================
   "task-result-getter-2":
        at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.killExecutors(CoarseGrainedSchedulerBackend.scala:604)
        - waiting to lock <0x00000003c0720ed0> (a 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend)
        at 
org.apache.spark.scheduler.BlacklistTracker.killExecutor(BlacklistTracker.scala:153)
        at 
org.apache.spark.scheduler.BlacklistTracker.org$apache$spark$scheduler$BlacklistTracker$$killBlacklistedExecutor(BlacklistTracker.scala:163)
        at 
org.apache.spark.scheduler.BlacklistTracker$$anonfun$updateBlacklistForSuccessfulTaskSet$1.apply(BlacklistTracker.scala:257)
        at 
org.apache.spark.scheduler.BlacklistTracker$$anonfun$updateBlacklistForSuccessfulTaskSet$1.apply(BlacklistTracker.scala:236)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
        at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
        at 
org.apache.spark.scheduler.BlacklistTracker.updateBlacklistForSuccessfulTaskSet(BlacklistTracker.scala:236)
        at 
org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$maybeFinishTaskSet$1.apply(TaskSetManager.scala:530)
        at 
org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$maybeFinishTaskSet$1.apply(TaskSetManager.scala:530)
        at scala.Option.foreach(Option.scala:257)
        at 
org.apache.spark.scheduler.TaskSetManager.org$apache$spark$scheduler$TaskSetManager$$maybeFinishTaskSet(TaskSetManager.scala:530)
        at 
org.apache.spark.scheduler.TaskSetManager.handleFailedTask(TaskSetManager.scala:916)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.handleFailedTask(TaskSchedulerImpl.scala:539)
        - locked <0x00000003c0720f78> (a 
org.apache.spark.scheduler.cluster.YarnClusterScheduler)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$4$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:150)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$4$$anonfun$run$2.apply(TaskResultGetter.scala:132)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$4$$anonfun$run$2.apply(TaskResultGetter.scala:132)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2005)
        at 
org.apache.spark.scheduler.TaskResultGetter$$anon$4.run(TaskResultGetter.scala:132)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   "dispatcher-event-loop-23":
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:321)
        - waiting to lock <0x00000003c0720f78> (a 
org.apache.spark.scheduler.cluster.YarnClusterScheduler)
        at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:248)
        - locked <0x00000003c0720ed0> (a 
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend)
        at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:136)
        at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
        at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   ` 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to