Ngone51 commented on pull request #34043:
URL: https://github.com/apache/spark/pull/34043#issuecomment-925463156


   > Given there was no MT-safety reason to delegate to DAGScheduler, I was 
thinking along the lines of what @f-thiele's change : we already have other 
Future invocations in BlockManagerMasterEndpoint.
   
   I think delegating to DAGScheduler is an alternative way to decouple 
BlockManagerMasterEndpoint and MapOutputTracker, which should also work in this 
case. The current fix looks good to me after removing the read/write lock. I'll 
approve it to catch up 3.2 cut.
   
   > When looking at the affected codepaths, BlockManager is not currently 
differentiating between the block id's in tryToReportBlockStatus when it is 
updating driver (normal and for block migration).
   Until node decommissioning, this path was used for non-shuffle blocks iirc.
   >
   > IMO handling the shuffle vs non-shuffle split while processing 
UpdateBlockInfo at driver is cleaner than spreading it out to caller; unless we 
have other issues in doing so (for example: MT-safety, correctness, design 
hygine, etc concerns).
   Thoughts ?
   
   I actually think it's cleaner to spread it out at the caller. It's confused 
the shuffle blocks are handled specifically comparing to other blocks. Ideally, 
I think we should send the RPC msg (e.g., having a new message called 
`UpdatedShuffleBlockLocation`) to `MapOutputTrackerMasterEndpoint` directly.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to