Ngone51 commented on pull request #32114:
URL: https://github.com/apache/spark/pull/32114#issuecomment-819255421


   > From BlockManager - e.g. reportBlockStatus.
   Just modifying HeartbeatReceiver won't solve the re-registration issue here. 
We will also have to implement a similar kind of tracking inside 
BlockManagerMasterEndpoint. And now since both tracking are independent of each 
other, it might introduce some race condition (please correct me if I am wrong).
   
   You're right. I followed the PR description only so I thought 
`HeartbeatReceiver` is the only problematic place.
   
   
   I checked the code and surprisingly find that we don't remove `BlockManager` 
when we remove an executor. And removing `BlockManager` happens in few cases 
only,
   
   * the corresponding executor of the `BlockManager` caused the shuffle fetch 
failure
   
https://github.com/apache/spark/blob/ee7d838aaf46f9d786e0388915b422fb78952893/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2081
   * an executor is removed redundantly
   
https://github.com/apache/spark/blob/ee7d838aaf46f9d786e0388915b422fb78952893/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L434
   * a new registered `BlockManager` evicts an old one (if any)
   
https://github.com/apache/spark/blob/ee7d838aaf46f9d786e0388915b422fb78952893/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala#L534-L537
   
   If that's the case (it seems not correct but exits for a long time already), 
I think posting the `SparkListenerBlockManagerAdded` inside the `if 
(!blockManagerInfo.contains(id)) ` would be enough for the whole fix?
   
   
https://github.com/apache/spark/blob/ee7d838aaf46f9d786e0388915b422fb78952893/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala#L531-L562
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to