sumeetgajjar commented on pull request #32114:
URL: https://github.com/apache/spark/pull/32114#issuecomment-827048420


   > So I think the solution now would be:
   > 
   > For the heartbeat, using [#32114 
(review)](https://github.com/apache/spark/pull/32114#pullrequestreview-634766324).
   > 
   > For the`Blokmanager`, adding a `BlockManagerEndpointSharedState` (as 
mentioned by @sumeetgajjar in [#32114 
(comment)](https://github.com/apache/spark/pull/32114#discussion_r612814973)) 
for both `BlockManagerMasterEndpoint` and 
`BlockManagerMasterHeartbeatEndpoint`. It's true that if we only adding a 
removal state to the `blockManagerInfo`, we have to filter the removed 
blockmanagers first before traversing it (e.g., we won't expect to return a 
removed blockmanager in `getPeers`).
   > 
   > In `BlockManagerEndpointSharedState`, we'd have both 
`activeBlockManagerInfo` and the `removedBlockManagerInfo`. We don't have to 
set up a new cleaner to clear the `removedBlockManagerInfo`. Instead, we can 
reuse the fix of `HeartbeatReceiver` as whenever there's a sure removal in 
`HeartbeatReceiver`, we can send a removal request to 
`BlockManagerEndpointSharedState` as well by following the code path of 
`!scheduler.executorHeartbeatReceived`(e.g., we could have 
`scheduler.clearBlockManagerInfo` similarly).
   > 
   > WDYT?
   
   @Ngone51 Thank you for this suggestion, I understand the solution, however, 
I believe this might be slight complex to keep track of things since the 
cleanup/removal is triggered from a different component i.e. 
`HeartbeatReceiver`. 
   
   I spoke to @attilapiros offline regarding his solution and it seems he 
missed to mention one thing that `BlockManagerInfo` will not be removed here: 
https://github.com/apache/spark/blob/1b609c7dcfc3a30aefff12a71aac5c1d6273b2c0/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala#L344
   
   Insterad, the cleanup thread would take care of removal. This will keep the 
whole logic in the same component i.e. `BlockManagerMasterEndpoint` and would 
be easy to track from a code understanding point of view.
   
   I will proceed with @attilapiros 
[proposal](https://github.com/apache/spark/pull/32114#discussion_r612373538) 
where we model BlockManager removal as a new state, run some tests and update 
more. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to