Github user brad-kaiser commented on the issue:
https://github.com/apache/spark/pull/19041
Hi @squito,
The back and forth communication between CacheRecoveryManager and the
BlockManagerMasterEndpoint is so that we always have an up to date view of what
executors are undergoing cache recovery and we don't replicate blocks to those
executors. If you look at recoverLatestBlock, we include the contents of the
recoveringExecutors cache.
We could conceivably move that cache into the block manager master
endpoint, but I think that would end up being messier. I wanted to keep all the
cache recovery code localized and not clutter up Block Manager Master Endpoint.
CacheRecoveryManager and BlockManagerMaster Endpoint will also be local to the
same process so rpc calls between them should be cheap, especially compared to
the time it will take to copy blocks around.
I will look into the race between removing the block and replicating the
next block.
Thanks
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]