vanzin commented on a change in pull request #24072: [SPARK-27112] : Create a
resource ordering between threads to resolve the deadlocks encountered …
URL: https://github.com/apache/spark/pull/24072#discussion_r265374148
##########
File path:
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##########
@@ -631,29 +641,33 @@ class CoarseGrainedSchedulerBackend(scheduler:
TaskSchedulerImpl, val rpcEnv: Rp
force: Boolean): Seq[String] = {
logInfo(s"Requesting to kill executor(s) ${executorIds.mkString(", ")}")
- val response = synchronized {
- val (knownExecutors, unknownExecutors) =
executorIds.partition(executorDataMap.contains)
- unknownExecutors.foreach { id =>
- logWarning(s"Executor to kill $id does not exist!")
- }
+ // SPARK-27112: We need to ensure that there is ordering of lock
acquisition
+ // between TaskSchedulerImpl and CoarseGrainedSchedulerBackend objects in
order to fix
+ // the deadlock issue exposed in SPARK-27112
+ val response = scheduler.synchronized {
+ this.synchronized {
+ val (knownExecutors, unknownExecutors) =
executorIds.partition(executorDataMap.contains)
+ unknownExecutors.foreach { id =>
+ logWarning(s"Executor to kill $id does not exist!")
+ }
- // If an executor is already pending to be removed, do not kill it again
(SPARK-9795)
- // If this executor is busy, do not kill it unless we are told to force
kill it (SPARK-9552)
- val executorsToKill = knownExecutors
- .filter { id => !executorsPendingToRemove.contains(id) }
- .filter { id => force || !scheduler.isExecutorBusy(id) }
- executorsToKill.foreach { id => executorsPendingToRemove(id) =
!countFailures }
+ // If an executor is already pending to be removed, do not kill it
again (SPARK-9795)
+ // If this executor is busy, do not kill it unless we are told to
force kill it (SPARK-9552)
+ val executorsToKill = knownExecutors
+ .filter { id => !executorsPendingToRemove.contains(id) }
+ .filter { id => force || !scheduler.isExecutorBusy(id) }
Review comment:
(Caught up with the previous discussion and it seems that here both locks
are needed to avoid an edge case where you could kill active executors.)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]