vanzin commented on a change in pull request #23842: [SPARK-26927]Fix race
condition may cause dynamic allocation not working
URL: https://github.com/apache/spark/pull/23842#discussion_r263240842
##########
File path: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
##########
@@ -725,10 +740,15 @@ private[spark] class ExecutorAllocationManager(
if (stageIdToNumRunningTask.contains(stageId)) {
stageIdToNumRunningTask(stageId) += 1
}
- // This guards against the race condition in which the
`SparkListenerTaskStart`
- // event is posted before the `SparkListenerBlockManagerAdded` event,
which is
- // possible because these events are posted in different threads. (see
SPARK-4951)
- if (!allocationManager.executorIds.contains(executorId)) {
+ // This guards against the following race condition:
+ // 1. The `SparkListenerTaskStart` event is posted before the
+ // `SparkListenerExecutorAdded` event
+ // 2. The `SparkListenerExecutorRemoved` event is posted before the
+ // `SparkListenerTaskStart` event
+ // Above cases are possible because these events are posted in
different threads.
+ // (see SPARK-4951 SPARK-26927)
+ if (!allocationManager.executorIds.contains(executorId) &&
+ !allocationManager.removedExecutorIds.contains(executorId)) {
Review comment:
You didn't understand my suggestion.
I'm asking you to, instead of keeping a list of executors that have been
removed here, just as the scheduler about whether that executor exist. The
scheduler keeps the authoritative lists of executors.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]