Ngone51 commented on pull request #34043: URL: https://github.com/apache/spark/pull/34043#issuecomment-925463156
> Given there was no MT-safety reason to delegate to DAGScheduler, I was thinking along the lines of what @f-thiele's change : we already have other Future invocations in BlockManagerMasterEndpoint. I think delegating to DAGScheduler is an alternative way to decouple BlockManagerMasterEndpoint and MapOutputTracker, which should also work in this case. The current fix looks good to me after removing the read/write lock. I'll approve it to catch up 3.2 cut. > When looking at the affected codepaths, BlockManager is not currently differentiating between the block id's in tryToReportBlockStatus when it is updating driver (normal and for block migration). Until node decommissioning, this path was used for non-shuffle blocks iirc. > > IMO handling the shuffle vs non-shuffle split while processing UpdateBlockInfo at driver is cleaner than spreading it out to caller; unless we have other issues in doing so (for example: MT-safety, correctness, design hygine, etc concerns). Thoughts ? I actually think it's cleaner to spread it out at the caller. It's confused the shuffle blocks are handled specifically comparing to other blocks. Ideally, I think we should send the RPC msg (e.g., having a new message called `UpdatedShuffleBlockLocation`) to `MapOutputTrackerMasterEndpoint` directly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
