dhruve commented on issue #24035: [SPARK-27112] : Spark Scheduler encounters two independent Deadlocks … URL: https://github.com/apache/spark/pull/24035#issuecomment-472135426 I think if we fix the lock ordering for the involved threads, this will solve the issue. The current order in which locks are being acquired for individual threads is: TaskResultGetter Order: - Lock YarnClusterScheduler - Lock CoarseGrainedSchedulerBackend DispatcherEventLoop Order: - Lock CoarseGrainedSchedulerBackend - Lock YarnClusterScheduler SparkDynamicExecutorAllocation Order: - Lock ExecutorAllocationManager - Lock CoarseGrainedSchedulerBackend - Lock TaskSchedulerImpl/YarnClusterScheduler Solution: The methods which are resulting in the deadlock are from activity in the CoarseGrainedSchedulerBackend. 1. KillExecutors: The only check which requires the lock on TSI/YCS is to check if the executor is busy or not. We can bump up the check for idle executors before synchronizing on CGSB. This will fix the lock order for the dynamic allocation thread. 2. MakeOffers: This currently acquires the lock on CGSB to ensure executors are not killed while a task is being offered on them. And eventually makes the `resourceOffer` on the scheduler which is where it acquires the second lock. I agree with @attilapiros suggestion here to fix the second lock ordering issue by synchronizing on the scheduler first and then the backend. These 2 changes should align the ordering sequence and seem to be simple to reason about. I think this should solve the issue, but it would be good to have more contributors eyeball this change.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
