sumeetgajjar commented on pull request #32114: URL: https://github.com/apache/spark/pull/32114#issuecomment-827048420
> So I think the solution now would be: > > For the heartbeat, using [#32114 (review)](https://github.com/apache/spark/pull/32114#pullrequestreview-634766324). > > For the`Blokmanager`, adding a `BlockManagerEndpointSharedState` (as mentioned by @sumeetgajjar in [#32114 (comment)](https://github.com/apache/spark/pull/32114#discussion_r612814973)) for both `BlockManagerMasterEndpoint` and `BlockManagerMasterHeartbeatEndpoint`. It's true that if we only adding a removal state to the `blockManagerInfo`, we have to filter the removed blockmanagers first before traversing it (e.g., we won't expect to return a removed blockmanager in `getPeers`). > > In `BlockManagerEndpointSharedState`, we'd have both `activeBlockManagerInfo` and the `removedBlockManagerInfo`. We don't have to set up a new cleaner to clear the `removedBlockManagerInfo`. Instead, we can reuse the fix of `HeartbeatReceiver` as whenever there's a sure removal in `HeartbeatReceiver`, we can send a removal request to `BlockManagerEndpointSharedState` as well by following the code path of `!scheduler.executorHeartbeatReceived`(e.g., we could have `scheduler.clearBlockManagerInfo` similarly). > > WDYT? @Ngone51 Thank you for this suggestion, I understand the solution, however, I believe this might be slight complex to keep track of things since the cleanup/removal is triggered from a different component i.e. `HeartbeatReceiver`. I spoke to @attilapiros offline regarding his solution and it seems he missed to mention one thing that `BlockManagerInfo` will not be removed here: https://github.com/apache/spark/blob/1b609c7dcfc3a30aefff12a71aac5c1d6273b2c0/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala#L344 Insterad, the cleanup thread would take care of removal. This will keep the whole logic in the same component i.e. `BlockManagerMasterEndpoint` and would be easy to track from a code understanding point of view. I will proceed with @attilapiros [proposal](https://github.com/apache/spark/pull/32114#discussion_r612373538) where we model BlockManager removal as a new state, run some tests and update more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
