vanzin commented on a change in pull request #24072: [SPARK-27112] : Create a 
resource ordering between threads to resolve the deadlocks encountered …
URL: https://github.com/apache/spark/pull/24072#discussion_r265329112
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ##########
 @@ -631,29 +641,33 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
       force: Boolean): Seq[String] = {
     logInfo(s"Requesting to kill executor(s) ${executorIds.mkString(", ")}")
 
-    val response = synchronized {
-      val (knownExecutors, unknownExecutors) = 
executorIds.partition(executorDataMap.contains)
-      unknownExecutors.foreach { id =>
-        logWarning(s"Executor to kill $id does not exist!")
-      }
+    // SPARK-27112: We need to ensure that there is ordering of lock 
acquisition
+    // between TaskSchedulerImpl and CoarseGrainedSchedulerBackend objects in 
order to fix
+    // the deadlock issue exposed in SPARK-27112
+    val response = scheduler.synchronized {
+      this.synchronized {
+        val (knownExecutors, unknownExecutors) = 
executorIds.partition(executorDataMap.contains)
+        unknownExecutors.foreach { id =>
+          logWarning(s"Executor to kill $id does not exist!")
+        }
 
-      // If an executor is already pending to be removed, do not kill it again 
(SPARK-9795)
-      // If this executor is busy, do not kill it unless we are told to force 
kill it (SPARK-9552)
-      val executorsToKill = knownExecutors
-        .filter { id => !executorsPendingToRemove.contains(id) }
-        .filter { id => force || !scheduler.isExecutorBusy(id) }
-      executorsToKill.foreach { id => executorsPendingToRemove(id) = 
!countFailures }
+        // If an executor is already pending to be removed, do not kill it 
again (SPARK-9795)
+        // If this executor is busy, do not kill it unless we are told to 
force kill it (SPARK-9552)
+        val executorsToKill = knownExecutors
+          .filter { id => !executorsPendingToRemove.contains(id) }
+          .filter { id => force || !scheduler.isExecutorBusy(id) }
 
 Review comment:
   In a similar vein to my previous comment, although I'm less sure about this 
one.
   
   This seems to be the only interaction with the scheduler in this method, so 
could this filtering be done first thing in the method, with the scheduler lock 
held, and then the rest of the code just needs the 
`CoarseGrainedSchedulerBackend` lock?
   
   It seems to me the behavior wouldn't change from the current state (where 
the internal scheduler state can change while this  method is running). And as 
in the other case, easier to understand things when you're only holding one 
lock.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to