Ngone51 commented on issue #23842: [SPARK-26927]Fix race condition may cause dynamic allocation not working URL: https://github.com/apache/spark/pull/23842#issuecomment-468523250 So, to make the solution more determined rather than basing on the assume `the late events for the executor will be correctly handled before the removal`, I have a new idea here: maintain `taskStartExecutorIds` and `removedExecutorIds` concurrently. When we receive TaskSatrtEvent with executor id X, * if X is in `removedExecutorIds`, indicating this TaskSatrtEvent is coming after ExecutorRemovedEvent, so we clear X in `removedExecutorIds`; * if X is not in `removedExecutorIds`, contrarily, we store X in `taskStartExecutorIds`. When we receive ExecutorRemovedEvent with executor id Y, * if Y is not in `taskStartExecutorIds`, indicating this ExecutorRemovedEvent is comming before TaskSatrtEvent, so we store Y in `removedExecutorIds`; * if Y is in `taskStartExecutorIds`, contrarily, we clear Y in `taskStartExecutorIds`; And the rest thing we need to care about is what if `taskStartExecutorIds` or `removedExecutorIds` is full ? Lets take `removedExecutorIds` for example. Hmmm, if we follow the solution above, I think it's extreamly impossible for `removedExecutorIds` to be full since we'll intermittently clear `removedExecutorIds`. But it may be still possible that `removedExecutorIds` to be full due to plenty of TaskStartEvents blocked for a long long time theoretically. If it really happens, I'd choose to fail the entire ExecutorAllocationManager immediately.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
