Ngone51 commented on pull request #32114: URL: https://github.com/apache/spark/pull/32114#issuecomment-819255421
> From BlockManager - e.g. reportBlockStatus. Just modifying HeartbeatReceiver won't solve the re-registration issue here. We will also have to implement a similar kind of tracking inside BlockManagerMasterEndpoint. And now since both tracking are independent of each other, it might introduce some race condition (please correct me if I am wrong). You're right. I followed the PR description only so I thought `HeartbeatReceiver` is the only problematic place. I checked the code and surprisingly find that we don't remove `BlockManager` when we remove an executor. And removing `BlockManager` happens in few cases only, * the corresponding executor of the `BlockManager` caused the shuffle fetch failure https://github.com/apache/spark/blob/ee7d838aaf46f9d786e0388915b422fb78952893/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2081 * an executor is removed redundantly https://github.com/apache/spark/blob/ee7d838aaf46f9d786e0388915b422fb78952893/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L434 * a new registered `BlockManager` evicts an old one (if any) https://github.com/apache/spark/blob/ee7d838aaf46f9d786e0388915b422fb78952893/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala#L534-L537 If that's the case (it seems not correct but exits for a long time already), I think posting the `SparkListenerBlockManagerAdded` inside the `if (!blockManagerInfo.contains(id)) ` would be enough for the whole fix? https://github.com/apache/spark/blob/ee7d838aaf46f9d786e0388915b422fb78952893/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala#L531-L562 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
